Tag Archives: utf-8

FPDF: How to use a Degree Symbol in a Generated PDF Tips, Tricks and Tutorials 29 SEP 2017

I’ve been using the FPDF PDF generator library for years now as the de facto method for my PHP projects to produce PDF reports. However, one minor annoyance is that the generated PDF files often falter when it comes to the inclusion of certain special characters – like the degree symbol (°) as an example.  (Basically, something like °C becomes °C in the final document)

The reason for this happening is that Arial, the default used/included font, is of type ISO-8859-1 while the degree symbol is UTF-8 encoded. So in order for us to include special symbols or characters from other languages, we need to either try and convert them into our font compatible ISO-8859-1 format, or perhaps switch to using a different TrueType or Type1 font (which then would contain the desired character set).

Now while UTF-8 support is available via a modified class, the easiest way to fix the degree symbol issue without having doing any real work is to simply make use of the PHP utf8_decode function, which convert UTF-8 encoded strings to their ISO-8859-1 equivalents.

In other words outputting utf8_decode(“°C”) to your PDF should result in the expected °C

Related Link: FPDF PDF Generator Library

A Test String for UTF-8 and Internationalization Work Tips, Tricks and Tutorials 02 NOV 2010

As an English-speaking, South African developer, I craft all my work for an English-speaking audience. I rather, I would like to always do that, but unfortunately the harsh reality of commerce won’t let me – I need to craft things that will work for different languages under different character sets (and let’s just ignore the whole left to right and right to left thing!).

Unfortunately, as web development goes, the default for everything is the constrained Latin alphabet, which unfortunately is rather limiting in terms of the number of characters it allows for. And because of the nature of the Internet, unfortunately your home grown little applications seldomnly stick around for your natural language use only! A potential solution to this is of course to embrace multibyte character encodings, with UTF-8 going a long way in establishing itself as the encoding to work with in these situations.

Fantastic. Loads of documentation, lots of caveats to take into account and plenty of work to do.

So how about a little test string for your web development, one that will make sure your web application can handle the concept of Internationalization?

No problem, try this one from Sam Ruby on for size:

Iñtërnâtiônàlizætiøn

There, that should keep you on your developer toes! :)

internationalization many flags of the world flapping in the wind

How to Convert an UTF-16 File to an UTF-8 file using PHP CodeUnit 07 MAR 2010

Taking Andrew Walker’s previously mentioned handy little UTF-16 to UTF-8 string converter function, we now have in our means a particularly easy way in which to craft a simple UTF-16 to UTF-8 file converter, useful as I have found in the past for those silly little cases like when someone is spitting out Microsoft SQL Server generated CSV files (which are by default encoded in UTF-16) at you for example.

So let’s put down the code then shall we?

function utf16_to_utf8($str) {
    $c0 = ord($str[0]);
    $c1 = ord($str[1]);

    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $be = false;
    } else {
        return $str;
    }

    $str = substr($str, 2);
    $len = strlen($str);
    $dec = '';
    for ($i = 0; $i < $len; $i += 2) {
        $c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) :
                ord($str[$i + 1]) << 8 | ord($str[$i]);
        if ($c >= 0x0001 && $c <= 0x007F) {
            $dec .= chr($c);
        } else if ($c > 0x07FF) {
            $dec .= chr(0xE0 | (($c >> 12) & 0x0F));
            $dec .= chr(0x80 | (($c >>  6) & 0x3F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        } else {
            $dec .= chr(0xC0 | (($c >>  6) & 0x1F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        }
    }
    return $dec;
}

function convert_file_to_utf8($csvfile) {
    $utfcheck = file_get_contents($csvfile);
    $utfcheck = utf16_to_utf8($utfcheck);
    file_put_contents($csvfile,$utfcheck);
}

To convert a file simply call the convert_file_to_utf8() function and pass to it the file path of the file you wish to convert. The function then uses the PHP function file_get_contents() to pack the input file’s contents into a string variable which is then passed to the main converter function which converts the string from UTF-16 to UTF-8 encoding if necessary. Finally, we use file_put_contents() to stuff the resulting string back into the original file, overwriting the original file contents.

Nice and simple really.

PHP: Convert a UTF-16 String to a UTF-8 String CodeUnit 05 MAR 2010

Andrew Walker crafted this handy little PHP function which can convert a UTF-16 encoded string into a more PHP-friendly UTF-8 encoded string.

The function first checks to see if the string passed to it is prefixed with a Byte Order Mark (BOM), and if the necessary BOM exists, the function continues to convert the rest of the string to its more compact UTF-8 format.

Obviously if no BOM is present, the function leaves the input string unchanged.

function utf16_to_utf8($str) {
    $c0 = ord($str[0]);
    $c1 = ord($str[1]);

    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $be = false;
    } else {
        return $str;
    }

    $str = substr($str, 2);
    $len = strlen($str);
    $dec = '';
    for ($i = 0; $i < $len; $i += 2) {
        $c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) : 
                ord($str[$i + 1]) << 8 | ord($str[$i]);
        if ($c >= 0x0001 && $c <= 0x007F) {
            $dec .= chr($c);
        } else if ($c > 0x07FF) {
            $dec .= chr(0xE0 | (($c >> 12) & 0x0F));
            $dec .= chr(0x80 | (($c >>  6) & 0x3F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        } else {
            $dec .= chr(0xC0 | (($c >>  6) & 0x1F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        }
    }
    return $dec;
}

Thanks Andrew, this was exactly what I was looking for! :)

Related Link: http://www.moddular.org/log/utf16-to-utf8