Tag Archives: string

MySQL: How to Select the First Word of a Sentence using SQL CodeUnit 21 MAY 2012

Sometime it is useful to be able to extract the first word of a sentence contained in one of the columns of your table. Luckily for us, MySQL makes this a trivial operation thanks to its useful SUBSTRING_INDEX() function.

From the reference manual:

SUBSTRING_INDEX(str,delim,count) – Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.

So for our purposes, if we want to select the first word from a sentence or string of words, the natural delimiter we would be looking out for is ‘ ‘ – i.e. an empty space character. Because we’re interested in the first word of the sentence, we want everything returned before the first occurrence of the space, leaving us with sql which looks like this:

SELECT SUBSTRING_INDEX( `myColumn` , ' ', 1 ) as `firstWord` FROM `sentences`


PHP: How to Generate a Random String Containing Both Letters and Numbers Programming 11 MAY 2012

To generate a string containing a random selection of both letters and numbers (i.e. an alphanumeric string) using PHP is pretty trivial.

Essentially what we want to do is define a string containing all the characters we wish to use in the generated string. Then randomly select characters from the string and glue them all together until we get a random string of the desired length.

Coded as a function, you get:

function rand_string( $length = 5 ) {
	$str = ''; //the resulting random string
	$chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";	 //characters making up the random string

	$size = strlen( $chars );
	for( $i = 0; $i < $length; $i++ ) {
		$str .= $chars[ rand( 0, $size - 1 ) ];

	return $str;

Pretty simple, but quite useful actually.

JavaScript: Grab a part of a String with the Substring Function! CodeUnit 14 DEC 2011

Considering I work with PHP day in and day out, it is no wonder that I completely forgot that JavaScript also comes up with its super handy substring function, for when you need to grab only a specific part of a string (much like PHP’s substr function).

string.substring(from, to)

The JavaScript substring() method extracts the characters from a string, between two specified indices, and returns the new sub string. In extracts the characters in a string between “from” and “to”, not including “to” itself.

Two parameters, the first required, the second optional. In our example above, from indicates the index where to start the extraction – remember, the first character is at index 0. The second, optional parameter is to, which if specified indicates the index where to stop the extraction. If omitted, substring extracts the rest of the string.

In action:

var str="Hello world!";
"); document.write(str.substring(3,7));

The above will output:

lo world!
lo w


PHP: How to Get the Last Character of a String CodeUnit 19 OCT 2011

To get the last character of a given string in PHP is made very, very simple thanks to the plain old vanilla substring (substr) function.

When you call substr, you feed it the string, the starting point and the number of characters you want it to return. If you leave out the number of characters to return, then the function returns all the characters from the starting point until the end of the string.

Knowing this, we can then figure out that if we want the last character of a string, we simply make the starting point the length of the string minus 1 to take into account the zero-based character position counter. In practice:

echo substr('My String',strlen('My String') - 1); //returns g

However, substr makes this EVEN easier by allowing us to specify a NEGATIVE starting point. This indicates that the function should count towards the left starting from the last character instead of the usual counting to the right starting from the first character. Armed with this knowledge, our call to retrieve the last character in the string now becomes:

echo substr('My String',-1); //returns g

Feeding a -3 would return the last three characters of the string, meaning our above example would print out ‘ing’.

And now you know! :)

PHP: Strip Non-Alphanumeric Characters out of a String CodeUnit 26 NOV 2010

Sometimes it comes in quite handy to strip out all the non-alphanumeric characters from a given string. Of course, we could just use a bog standard preg_replace with a valid regex to achieve this, though the result of doing this might result in a non-readable resulting string.

For example:

Sitting in a café, Jørgen thought the wallpaper too passé

could potentially become

Sitting in a caf Jrgen thought the wallpaper too pass

Unfortunately this string has lost too much of its meaning with the simple strip, meaning we should pad out our preg_replace approach with another function that removes all the Accents off letters first.

So running the string through the removeAccents followed by the regex would result in this:

Sitting in a cafe Jorgen thought the wallpaper too passe

In other words, much more readable. And the code to do this for us?

function removeAccents($str)
  $a = array('À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Æ', 'Ç', 'È', 'É', 'Ê', 'Ë', 'Ì', 'Í', 'Î', 'Ï', 'Ð', 'Ñ', 'Ò', 'Ó', 'Ô', 'Õ', 'Ö', 'Ø', 'Ù', 'Ú', 'Û', 'Ü', 'Ý', 'ß', 'à', 'á', 'â', 'ã', 'ä', 'å', 'æ', 'ç', 'è', 'é', 'ê', 'ë', 'ì', 'í', 'î', 'ï', 'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', 'ø', 'ù', 'ú', 'û', 'ü', 'ý', 'ÿ', 'Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ', 'ĉ', 'Ċ', 'ċ', 'Č', 'č', 'Ď', 'ď', 'Đ', 'đ', 'Ē', 'ē', 'Ĕ', 'ĕ', 'Ė', 'ė', 'Ę', 'ę', 'Ě', 'ě', 'Ĝ', 'ĝ', 'Ğ', 'ğ', 'Ġ', 'ġ', 'Ģ', 'ģ', 'Ĥ', 'ĥ', 'Ħ', 'ħ', 'Ĩ', 'ĩ', 'Ī', 'ī', 'Ĭ', 'ĭ', 'Į', 'į', 'İ', 'ı', 'IJ', 'ij', 'Ĵ', 'ĵ', 'Ķ', 'ķ', 'Ĺ', 'ĺ', 'Ļ', 'ļ', 'Ľ', 'ľ', 'Ŀ', 'ŀ', 'Ł', 'ł', 'Ń', 'ń', 'Ņ', 'ņ', 'Ň', 'ň', 'ʼn', 'Ō', 'ō', 'Ŏ', 'ŏ', 'Ő', 'ő', 'Œ', 'œ', 'Ŕ', 'ŕ', 'Ŗ', 'ŗ', 'Ř', 'ř', 'Ś', 'ś', 'Ŝ', 'ŝ', 'Ş', 'ş', 'Š', 'š', 'Ţ', 'ţ', 'Ť', 'ť', 'Ŧ', 'ŧ', 'Ũ', 'ũ', 'Ū', 'ū', 'Ŭ', 'ŭ', 'Ů', 'ů', 'Ű', 'ű', 'Ų', 'ų', 'Ŵ', 'ŵ', 'Ŷ', 'ŷ', 'Ÿ', 'Ź', 'ź', 'Ż', 'ż', 'Ž', 'ž', 'ſ', 'ƒ', 'Ơ', 'ơ', 'Ư', 'ư', 'Ǎ', 'ǎ', 'Ǐ', 'ǐ', 'Ǒ', 'ǒ', 'Ǔ', 'ǔ', 'Ǖ', 'ǖ', 'Ǘ', 'ǘ', 'Ǚ', 'ǚ', 'Ǜ', 'ǜ', 'Ǻ', 'ǻ', 'Ǽ', 'ǽ', 'Ǿ', 'ǿ');
  $b = array('A', 'A', 'A', 'A', 'A', 'A', 'AE', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y', 's', 'a', 'a', 'a', 'a', 'a', 'a', 'ae', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'IJ', 'ij', 'J', 'j', 'K', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'l', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'OE', 'oe', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z', 's', 'f', 'O', 'o', 'U', 'u', 'A', 'a', 'I', 'i', 'O', 'o', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'A', 'a', 'AE', 'ae', 'O', 'o');
  return str_replace($a, $b, $str);

$newstring = preg_replace("/[^a-zA-Z0-9.s]/", "", removeAccents($oldstring));

where $oldstring is the string containing the non-alphanumeric characters.

So in summary, easy peasy stuff this.

How to Convert an UTF-16 File to an UTF-8 file using PHP CodeUnit 07 MAR 2010

Taking Andrew Walker’s previously mentioned handy little UTF-16 to UTF-8 string converter function, we now have in our means a particularly easy way in which to craft a simple UTF-16 to UTF-8 file converter, useful as I have found in the past for those silly little cases like when someone is spitting out Microsoft SQL Server generated CSV files (which are by default encoded in UTF-16) at you for example.

So let’s put down the code then shall we?

function utf16_to_utf8($str) {
    $c0 = ord($str[0]);
    $c1 = ord($str[1]);

    if ($c0 == 0xFE &amp;&amp; $c1 == 0xFF) {
        $be = true;
    } else if ($c0 == 0xFF &amp;&amp; $c1 == 0xFE) {
        $be = false;
    } else {
        return $str;

    $str = substr($str, 2);
    $len = strlen($str);
    $dec = '';
    for ($i = 0; $i &lt; $len; $i += 2) {
        $c = ($be) ? ord($str[$i]) &lt;&lt; 8 | ord($str[$i + 1]) :
                ord($str[$i + 1]) &lt;&lt; 8 | ord($str[$i]);
        if ($c &gt;= 0x0001 &amp;&amp; $c &lt;= 0x007F) {
            $dec .= chr($c);
        } else if ($c &gt; 0x07FF) {
            $dec .= chr(0xE0 | (($c &gt;&gt; 12) &amp; 0x0F));
            $dec .= chr(0x80 | (($c &gt;&gt;  6) &amp; 0x3F));
            $dec .= chr(0x80 | (($c &gt;&gt;  0) &amp; 0x3F));
        } else {
            $dec .= chr(0xC0 | (($c &gt;&gt;  6) &amp; 0x1F));
            $dec .= chr(0x80 | (($c &gt;&gt;  0) &amp; 0x3F));
    return $dec;

function convert_file_to_utf8($csvfile) {
    $utfcheck = file_get_contents($csvfile);
    $utfcheck = utf16_to_utf8($utfcheck);

To convert a file simply call the convert_file_to_utf8() function and pass to it the file path of the file you wish to convert. The function then uses the PHP function file_get_contents() to pack the input file’s contents into a string variable which is then passed to the main converter function which converts the string from UTF-16 to UTF-8 encoding if necessary. Finally, we use file_put_contents() to stuff the resulting string back into the original file, overwriting the original file contents.

Nice and simple really.

PHP: Convert a UTF-16 String to a UTF-8 String CodeUnit 05 MAR 2010

Andrew Walker crafted this handy little PHP function which can convert a UTF-16 encoded string into a more PHP-friendly UTF-8 encoded string.

The function first checks to see if the string passed to it is prefixed with a Byte Order Mark (BOM), and if the necessary BOM exists, the function continues to convert the rest of the string to its more compact UTF-8 format.

Obviously if no BOM is present, the function leaves the input string unchanged.

function utf16_to_utf8($str) {
    $c0 = ord($str[0]);
    $c1 = ord($str[1]);

    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $be = false;
    } else {
        return $str;

    $str = substr($str, 2);
    $len = strlen($str);
    $dec = '';
    for ($i = 0; $i < $len; $i += 2) {
        $c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) : 
                ord($str[$i + 1]) << 8 | ord($str[$i]);
        if ($c >= 0x0001 && $c <= 0x007F) {
            $dec .= chr($c);
        } else if ($c > 0x07FF) {
            $dec .= chr(0xE0 | (($c >> 12) & 0x0F));
            $dec .= chr(0x80 | (($c >>  6) & 0x3F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        } else {
            $dec .= chr(0xC0 | (($c >>  6) & 0x1F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
    return $dec;

Thanks Andrew, this was exactly what I was looking for! :)

Related Link: http://www.moddular.org/log/utf16-to-utf8

PHP: Insert a String into another String Programming 30 JUL 2008

PHPWhile most modern programming languages do feature a function that allows you to insert one string into another string at any given position, PHP for some or other reason simply doesn’t. So while you might be well versed in using a good old insert(str,x) function in any of the other languages that you might be used to, doing the same in PHP will no doubt leave you scratching your head as you frantically browse through all of the PHP string functions in their nifty online manual while trying to find the elusive little bugger.

So if no native string insert function exists, how exactly does one go about doing it then?

Well, the solution lies in the creative use of the PHP substring replace function (substr_replace). Essentially our goal is to insert our insert string into the original string by replacing a ‘substring’ of length zero at the desired position in the original string.

The syntax for doing this would then be:

$newstring = substr_replace($orig_string, $insert_string, $position, 0);

So for example if we wanted to inject ‘my’ into the classic sentence ‘Hello world!’, you would code:

echo substr_replace('Hello world!','my ',6,0);

Which would then result in ‘Hello my world!’ appearing on the screen. (Note, if you enter a negative position then the string is inserted so many characters from the end of the original string.)