PHP: Convert a UTF-16 String to a UTF-8 String CodeUnit 05 MAR 2010

Andrew Walker crafted this handy little PHP function which can convert a UTF-16 encoded string into a more PHP-friendly UTF-8 encoded string.

The function first checks to see if the string passed to it is prefixed with a Byte Order Mark (BOM), and if the necessary BOM exists, the function continues to convert the rest of the string to its more compact UTF-8 format.

Obviously if no BOM is present, the function leaves the input string unchanged.

function utf16_to_utf8($str) {
    $c0 = ord($str[0]);
    $c1 = ord($str[1]);

    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $be = false;
    } else {
        return $str;
    }

    $str = substr($str, 2);
    $len = strlen($str);
    $dec = '';
    for ($i = 0; $i < $len; $i += 2) {
        $c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) : 
                ord($str[$i + 1]) << 8 | ord($str[$i]);
        if ($c >= 0x0001 && $c <= 0x007F) {
            $dec .= chr($c);
        } else if ($c > 0x07FF) {
            $dec .= chr(0xE0 | (($c >> 12) & 0x0F));
            $dec .= chr(0x80 | (($c >>  6) & 0x3F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        } else {
            $dec .= chr(0xC0 | (($c >>  6) & 0x1F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        }
    }
    return $dec;
}

Thanks Andrew, this was exactly what I was looking for! :)

Related Link: http://www.moddular.org/log/utf16-to-utf8

Related Posts:

About Craig Lotter

Software developer, husband and dad to two little girls. Writer behind An Exploring South African. I don't have time for myself any more.

  • It’s an amazing, thanks for sharing.

  • It’s an amazing, thanks for sharing.

  • f1na

    Or simply :

    mb_convert_encoding ($string, 'UTF-8', 'UTF-16');

    http://php.net/manual/en/function.mb-convert-en… :)

  • OllieJones

    Whoo hoo. Thank you Andrew Walker and Craig Lotter. This is perfect for decoding id3v2 tags that Adobe Audition and other tools jam into MP3 files.

  • ketting00

    It can’t decode this encoded string: 呭㤳䥆汶摓䉄套㑧唲噬䥅ㅬ䥑㴽

  • Thank for share, can i copy your code in to my blog for share on Vietnamese language?