See Also: Home Links Personal Site Blogroll  FriendFeed CV

Tags:

Internationalisation

In 2004 I attended OSCON in Portland, Oregon and made the following notes after one of the sessions there...

Practical i18n with PHP and MySQL

This session was hosted by Jim Winstead (a MySQLNew Page staff developer) and covered "methods of handling internationalization (i18n) and localization (l10n) with MySQLNew Page and PHP"

The talk stared with brief coverage of the different character sets (ISO8859/1, SHIFT-JIS etc) and encodings (UTF etc) then moved onto actual handling issues and support. Lots of the content was generic i18n stuff. Four main areas were focused on...

  1. What you send to browsers
  2. What they send back
  3. Storing encoded content in dbases
  4. Sorting strings

On the what you send front they reccomended injecting META tags for content-type as the first tag in the head section as is seems some browsers dont behave well unless its near the top. I think we've discovered this ourselves (or noted in a blog) a while back.

Standard behaviour (according to the spec that is) is supposed to be that if you specifiy an encoding the browser should use that same encoding when it submits back to you. In practice this cannot be relied upon so you may for example get some JIS encoded Kanji coming back when you where hoping they'd enter with UTF.

One useful workaround is to inject a hidden field into the form youre sending to the client that contains a value you know the encoding of, then when they submit the form you can inspect the value to see if its encoding has changed. Sounds crude but its fail-safe. There are a number of simple regex's you can use to check the encoding by testing values of bytes in the sequence.

To translate between encodings there are a number of useful tools for PHP, some require an extension library (which is trivial to setup), examples are mbstring, iconv and the GNU recode functions (both external libraries)

As far as localisation is concerned, there is a PHP extension function to the GNU gettext library which can be passed a text message and target locale and will look up a translation table for the target language.

Establishing the lookup file looks like a bit of an effort but would be a wickedly solid approach to localising any of our web apps. We could of course roll-our-own and the standard approach is to have some RDMS table of messages per language where you select the appropriate one for the UI element, function or message being displayed to the target user and their language.

Apparently MySQL4New Page.1 has very good built-in encoding handling and the presenter said it was as good or better than all other RDMS products (open-source or commercial) in that respect.


See Also: OS Con 04 | Web Development | Web Publishing | Notes Index