Tag Archives: iso-8859-1

FPDF: How to use a Degree Symbol in a Generated PDF Tips, Tricks and Tutorials 29 SEP 2017

I’ve been using the FPDF PDF generator library for years now as the de facto method for my PHP projects to produce PDF reports. However, one minor annoyance is that the generated PDF files often falter when it comes to the inclusion of certain special characters – like the degree symbol (°) as an example.  (Basically, something like °C becomes °C in the final document)

The reason for this happening is that Arial, the default used/included font, is of type ISO-8859-1 while the degree symbol is UTF-8 encoded. So in order for us to include special symbols or characters from other languages, we need to either try and convert them into our font compatible ISO-8859-1 format, or perhaps switch to using a different TrueType or Type1 font (which then would contain the desired character set).

Now while UTF-8 support is available via a modified class, the easiest way to fix the degree symbol issue without having doing any real work is to simply make use of the PHP utf8_decode function, which convert UTF-8 encoded strings to their ISO-8859-1 equivalents.

In other words outputting utf8_decode(“°C”) to your PDF should result in the expected °C

Related Link: FPDF PDF Generator Library

PHP: How to HTML Entity Decode an Euro Symbol CodeUnit 27 JUN 2011

I came across an annoying bug in my application where for some reason my Euro symbol wasn’t being HTML entity decoded for use in a PDF as opposed to the other available symbols which were. At first I assumed that this might be a bug in the PHP html_entity_decode function, but of course, a quick trip to the official PHP documentation proved me to be completely and utterly wrong.

The problem was in the character set all along!

Most of the time we simply run html_entity_decode by passing it the string we want decoded and perhaps the flag which controls whether or not to affect quotes. However, there is a third, often overlooked parameter which controls the character set which PHP needs to use when decoding these HTML entities – and that’s where the trick lies!

By default, if PHP can’t recognise the character set of the string passed to it, it assumes a character set of ISO-8859-1, known as Western European, Latin-1. However, this particular character set omits the Euro sign as well as a few French and Finnish letters, which are all added in ISO-8859-15, or Western European, Latin-9.

So in order to successfully decode our Euro symbol containing string, we simply need to run:

$decoded = html_entity_decode($eurostring,ENT_QUOTES, 'ISO-8859-15');

And now you know.

For a reference, these are the character sets which are supported:

  • ISO-8859-1 | ISO8859-1 | Western European, Latin-1
  • ISO-8859-15 | ISO8859-15 | Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1(ISO-8859-1).
  • UTF-8 | ASCII compatible multi-byte 8-bit Unicode.
  • cp866 | ibm866, 866 | DOS-specific Cyrillic charset. This charset is supported in 4.3.2.
  • cp1251 | Windows-1251, win-1251, 1251 | Windows-specific Cyrillic charset. This charset is supported in 4.3.2.
  • cp1252 | Windows-1252, 1252 | Windows specific charset for Western European.
  • KOI8-R | koi8-ru, koi8r | Russian. This charset is supported in 4.3.2.
  • BIG5 | 950 | Traditional Chinese, mainly used in Taiwan.
  • GB2312 | 936 | Simplified Chinese, national standard character set.
  • BIG5-HKSCS | Big5 with Hong Kong extensions, Traditional Chinese.
  • Shift_JIS | SJIS, 932 | Japanese
  • EUC-JP | EUCJP | Japanese