PHP: How to HTML Entity Decode an Euro Symbol CodeUnit 27 JUN 2011

I came across an annoying bug in my application where for some reason my Euro symbol wasn’t being HTML entity decoded for use in a PDF as opposed to the other available symbols which were. At first I assumed that this might be a bug in the PHP html_entity_decode function, but of course, a quick trip to the official PHP documentation proved me to be completely and utterly wrong.

The problem was in the character set all along!

Most of the time we simply run html_entity_decode by passing it the string we want decoded and perhaps the flag which controls whether or not to affect quotes. However, there is a third, often overlooked parameter which controls the character set which PHP needs to use when decoding these HTML entities – and that’s where the trick lies!

By default, if PHP can’t recognise the character set of the string passed to it, it assumes a character set of ISO-8859-1, known as Western European, Latin-1. However, this particular character set omits the Euro sign as well as a few French and Finnish letters, which are all added in ISO-8859-15, or Western European, Latin-9.

So in order to successfully decode our Euro symbol containing string, we simply need to run:

$decoded = html_entity_decode($eurostring,ENT_QUOTES, 'ISO-8859-15');

And now you know.

For a reference, these are the character sets which are supported:

  • ISO-8859-1 | ISO8859-1 | Western European, Latin-1
  • ISO-8859-15 | ISO8859-15 | Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1(ISO-8859-1).
  • UTF-8 | ASCII compatible multi-byte 8-bit Unicode.
  • cp866 | ibm866, 866 | DOS-specific Cyrillic charset. This charset is supported in 4.3.2.
  • cp1251 | Windows-1251, win-1251, 1251 | Windows-specific Cyrillic charset. This charset is supported in 4.3.2.
  • cp1252 | Windows-1252, 1252 | Windows specific charset for Western European.
  • KOI8-R | koi8-ru, koi8r | Russian. This charset is supported in 4.3.2.
  • BIG5 | 950 | Traditional Chinese, mainly used in Taiwan.
  • GB2312 | 936 | Simplified Chinese, national standard character set.
  • BIG5-HKSCS | Big5 with Hong Kong extensions, Traditional Chinese.
  • Shift_JIS | SJIS, 932 | Japanese
  • EUC-JP | EUCJP | Japanese
Related Posts:

About Craig Lotter

Software developer, husband and dad to two little girls. Writer behind An Exploring South African. I don't have time for myself any more.

  • Joe

    I’m using the code you and others have suggested but it isn’t working…  I’ve tried:
    $cost = htmlentities($cost, ENT_QUOTES, ‘ISO-8859-15’);

    $cost = html_entity_decode($cost, ENT_QUOTES, ‘ISO-8859-15’);

    and neither one works. I am lost… what else can I do? Does the table where it’s being stored have to be set up differently somehow?

    • Okay, but ignoring the database for now, does the html entities function correctly decode/encode a string containing a single Euro sign?

      • Joe

        Thanks for the reply. No, the function doesn’t work. Here’s some more detail. 
        – data was originally uploaded from XLS spreadsheet via phpMyAdmin, table was automatically created when I uploaded it
        – database “collation” = latin1_swedish_ci (default)
        – also tried changing collation to utf8_general_ci (same result)
        – entered into php page: htmlentities(“€”, ENT_COMPAT, “ISO-8859-15”)
        (typed euro symbol on my Mac keyboard as shift-option-2)
        – result is â�¬I’m running PHP v. 5.2.17I ran PHP info and the iconv section has this:
        iconv support enabled
        iconv implementation glibc
        iconv library version 2.5

        Directive                 Local Value Master Value
        iconv.input_encoding ISO-8859-1 ISO-8859-1
        iconv.internal_encoding ISO-8859-1 ISO-8859-1
        iconv.output_encoding ISO-8859-1 ISO-8859-1I also tried iconv:And the result was this: Original : This is the Euro symbol ‘€’. TRANSLIT : This is the Euro symbol ‘¤’. IGNORE : This is the Euro symbol ‘¤’. Plain : This is the Euro symbol ‘¤’.

      • Joe

        Craig – I think I’ve solved this. I changed the meta tag in my page’s head section to:

        …and now it works. Ugh. Something so simple…

        • It’s always the simple things that bring us down! J

  • deceze

    Unfortunately your analysis is off. The charset parameter does not signify the *input* charset, but the *target* charset. The input text should be entirely ASCII with HTML entities used to encode characters that cannot be represented in ASCII. When you want to decode those entities to actual characters, you need to choose a charset that can represent those characters. That’s what the charset parameter is for, to select the *output* charset. If a character cannot be represented in the selected target charset, it will be left as HTML entity.

    • Awesome, thanks for setting that straight for us then. You learn something new every day! :)

  • DivaVocals

    So I came across this post and I am hoping you can tell me why doing something similar, my Euro symbols show up on my PDF documents as some weird symbol instead of the Euro symbol.

    $pdf->MultiCell(50, 14,html_entity_decode($currencies->format($so->amount_applied),ENT_QUOTES, “ISO-8859-15”), 0, ‘R’);

    What am I overlooking here??

  • DivaVocals

     Screenprint of my result using this code

  • a

    funktioniert nicht