How to Convert an UTF-16 File to an UTF-8 file using PHP CodeUnit 07 MAR 2010

Taking Andrew Walker’s previously mentioned handy little UTF-16 to UTF-8 string converter function, we now have in our means a particularly easy way in which to craft a simple UTF-16 to UTF-8 file converter, useful as I have found in the past for those silly little cases like when someone is spitting out Microsoft SQL Server generated CSV files (which are by default encoded in UTF-16) at you for example.

So let’s put down the code then shall we?

function utf16_to_utf8($str) {
    $c0 = ord($str[0]);
    $c1 = ord($str[1]);

    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $be = false;
    } else {
        return $str;
    }

    $str = substr($str, 2);
    $len = strlen($str);
    $dec = '';
    for ($i = 0; $i < $len; $i += 2) {
        $c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) :
                ord($str[$i + 1]) << 8 | ord($str[$i]);
        if ($c >= 0x0001 && $c <= 0x007F) {
            $dec .= chr($c);
        } else if ($c > 0x07FF) {
            $dec .= chr(0xE0 | (($c >> 12) & 0x0F));
            $dec .= chr(0x80 | (($c >>  6) & 0x3F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        } else {
            $dec .= chr(0xC0 | (($c >>  6) & 0x1F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        }
    }
    return $dec;
}

function convert_file_to_utf8($csvfile) {
    $utfcheck = file_get_contents($csvfile);
    $utfcheck = utf16_to_utf8($utfcheck);
    file_put_contents($csvfile,$utfcheck);
}

To convert a file simply call the convert_file_to_utf8() function and pass to it the file path of the file you wish to convert. The function then uses the PHP function file_get_contents() to pack the input file’s contents into a string variable which is then passed to the main converter function which converts the string from UTF-16 to UTF-8 encoding if necessary. Finally, we use file_put_contents() to stuff the resulting string back into the original file, overwriting the original file contents.

Nice and simple really.

About Craig Lotter

Software developer, husband and dad to two little girls. Writer behind An Exploring South African. I don't have time for myself any more.

  • Visitor

    (KISS) – Keep It Simple, Stupid :)
    file_put_contents( ‘src_file.txt’, iconv(‘UTF-16’, ‘UTF-8’, file_get_contents(‘dest_file.txt’)));

    • Jo…

      Sadly iconv is not as reliable as I thought it would be, This seems like the best (albeit sluggish) alternative I’ve found so far…

  • Prbowen

    Dankie boet, dit het my werk baie vinniger gemaak.

  • webmasterlva

    Hello do you have utf-8 to utf-16LE ? thank you