Tag Archives: string encoding

FPDF: How to use a Degree Symbol in a Generated PDF Tips, Tricks and Tutorials 29 SEP 2017

I’ve been using the FPDF PDF generator library for years now as the de facto method for my PHP projects to produce PDF reports. However, one minor annoyance is that the generated PDF files often falter when it comes to the inclusion of certain special characters – like the degree symbol (°) as an example.  (Basically, something like °C becomes °C in the final document)

The reason for this happening is that Arial, the default used/included font, is of type ISO-8859-1 while the degree symbol is UTF-8 encoded. So in order for us to include special symbols or characters from other languages, we need to either try and convert them into our font compatible ISO-8859-1 format, or perhaps switch to using a different TrueType or Type1 font (which then would contain the desired character set).

Now while UTF-8 support is available via a modified class, the easiest way to fix the degree symbol issue without having doing any real work is to simply make use of the PHP utf8_decode function, which convert UTF-8 encoded strings to their ISO-8859-1 equivalents.

In other words outputting utf8_decode(“°C”) to your PDF should result in the expected °C

Related Link: FPDF PDF Generator Library

How to Convert an UTF-16 File to an UTF-8 file using PHP CodeUnit 07 MAR 2010

Taking Andrew Walker’s previously mentioned handy little UTF-16 to UTF-8 string converter function, we now have in our means a particularly easy way in which to craft a simple UTF-16 to UTF-8 file converter, useful as I have found in the past for those silly little cases like when someone is spitting out Microsoft SQL Server generated CSV files (which are by default encoded in UTF-16) at you for example.

So let’s put down the code then shall we?

function utf16_to_utf8($str) {
    $c0 = ord($str[0]);
    $c1 = ord($str[1]);

    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $be = false;
    } else {
        return $str;
    }

    $str = substr($str, 2);
    $len = strlen($str);
    $dec = '';
    for ($i = 0; $i < $len; $i += 2) {
        $c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) :
                ord($str[$i + 1]) << 8 | ord($str[$i]);
        if ($c >= 0x0001 && $c <= 0x007F) {
            $dec .= chr($c);
        } else if ($c > 0x07FF) {
            $dec .= chr(0xE0 | (($c >> 12) & 0x0F));
            $dec .= chr(0x80 | (($c >>  6) & 0x3F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        } else {
            $dec .= chr(0xC0 | (($c >>  6) & 0x1F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        }
    }
    return $dec;
}

function convert_file_to_utf8($csvfile) {
    $utfcheck = file_get_contents($csvfile);
    $utfcheck = utf16_to_utf8($utfcheck);
    file_put_contents($csvfile,$utfcheck);
}

To convert a file simply call the convert_file_to_utf8() function and pass to it the file path of the file you wish to convert. The function then uses the PHP function file_get_contents() to pack the input file’s contents into a string variable which is then passed to the main converter function which converts the string from UTF-16 to UTF-8 encoding if necessary. Finally, we use file_put_contents() to stuff the resulting string back into the original file, overwriting the original file contents.

Nice and simple really.