Tag Archives: character encoding

PHP: Handle Microsoft Windows Smart Quotes in Your Strings CodeUnit 04 JUL 2011

Microsoft Word likes converting single and double quotes into it’s so-called smart quotes representation, which can be annoying when you are trying to display the string on a device that doesn’t support smart quotes (the curly ones in case you were wondering). The easiest way to clean out these curly smart or fancy quotes is to run a simple search and replace on your string:

 $text = "string containing Microsoft Smart Quotes...";
 $chrs = array (chr(150), chr(147), chr(148), chr(146));
 $repl = array ("-", """, """, "'");
 $text = str_replace($chrs, $repl, $text);

And now you know.

PHP: How to Gracefully deal with some Microsoft Windows Special Characters CodeUnit 22 JUN 2011

Although Microsoft Windows smart quotes are the usual culprits for web developers who need to display strings on devices powered by operating systems other than Windows, there are in fact a number of other Windows-only special characters that can potentially rear their heads every now and then.

Below is a handy list of string replaces (done verbosely for easier reading) that will replace these Windows special characters with accepted alternatives which can more easily be ported across to different systems:

 $str = str_replace(chr(130), ',', $str);    // baseline single quote
 $str = str_replace(chr(131), 'NLG', $str);  // florin
 $str = str_replace(chr(132), '"', $str);    // baseline double quote
 $str = str_replace(chr(133), '...', $str);  // ellipsis
 $str = str_replace(chr(134), '**', $str);   // dagger (a second footnote)
 $str = str_replace(chr(135), '***', $str);  // double dagger (a third footnote)
 $str = str_replace(chr(136), '^', $str);    // circumflex accent
 $str = str_replace(chr(137), 'o/oo', $str); // permile
 $str = str_replace(chr(138), 'Sh', $str);   // S Hacek
 $str = str_replace(chr(139), '<', $str);    // left single guillemet
 $str = str_replace(chr(140), 'OE', $str);   // OE ligature
 $str = str_replace(chr(145), "'", $str);    // left single quote
 $str = str_replace(chr(146), "'", $str);    // right single quote
 $str = str_replace(chr(147), '"', $str);    // left double quote
 $str = str_replace(chr(148), '"', $str);    // right double quote
 $str = str_replace(chr(149), '-', $str);    // bullet
 $str = str_replace(chr(150), '-', $str);    // endash
 $str = str_replace(chr(151), '--', $str);   // emdash
 $str = str_replace(chr(152), '~', $str);    // tilde accent
 $str = str_replace(chr(153), '(TM)', $str); // trademark ligature
 $str = str_replace(chr(154), 'sh', $str);   // s Hacek
 $str = str_replace(chr(155), '>', $str);    // right single guillemet
 $str = str_replace(chr(156), 'oe', $str);   // oe ligature
 $str = str_replace(chr(159), 'Y', $str);    // Y Dieresis

Should be useful.

A Test String for UTF-8 and Internationalization Work Tips, Tricks and Tutorials 02 NOV 2010

As an English-speaking, South African developer, I craft all my work for an English-speaking audience. I rather, I would like to always do that, but unfortunately the harsh reality of commerce won’t let me – I need to craft things that will work for different languages under different character sets (and let’s just ignore the whole left to right and right to left thing!).

Unfortunately, as web development goes, the default for everything is the constrained Latin alphabet, which unfortunately is rather limiting in terms of the number of characters it allows for. And because of the nature of the Internet, unfortunately your home grown little applications seldomnly stick around for your natural language use only! A potential solution to this is of course to embrace multibyte character encodings, with UTF-8 going a long way in establishing itself as the encoding to work with in these situations.

Fantastic. Loads of documentation, lots of caveats to take into account and plenty of work to do.

So how about a little test string for your web development, one that will make sure your web application can handle the concept of Internationalization?

No problem, try this one from Sam Ruby on for size:

Iñtërnâtiônàlizætiøn

There, that should keep you on your developer toes! :)

internationalization many flags of the world flapping in the wind