15 Mar 2005 roozbeh   » (Master)

The first CLDR meeting I attended after almost a month just finished, with the international call provider dropping me out for no apparent reason and refusing to accept my PIN because it thought the other connection is still live. I hate it!

Anyway, I find it very sad that the only free software user of CLDR is OpenOffice (and ICU of course, but it's somehow the same thing, since it's not used in anything but OpenOffice in any normal Linux distro). CLDR is really the only properly peer-reviewed database of locale info, also with peer-reviewed localization of some very important things, like language, country, and timezone names.

And it tells you everything that glibc can't, like one-letter abbreviations for days and months, which Evolution could use, or different lengths of date and time display, which clock applets could use. It also doesn't have the problems people like Danilo have with glibc, Ulrich not accepting the new Serbian locales and being totally irresponsive about the reasons. glibc is currently probably the worst burden in starting a new localization project.

Currently, IBM, Sun, and Apple, the main three players in CLDR, are using the information for their shipping products. IBM uses it in ICU, which is supposed to become the platform providing Unicode and i18n to all their products, Apple has been using the data since Mac OS 10.2 (possibly through ICU), and Sun is using it in OpenOffice. There are also a few random committee members like me (representing High Council of Informatics of Iran) and representatives from Ireland and Finland national standard bodies.

Their main problem: the committee work is so slow that they can't release early and often, so projects with a shorter release cycle probably need to branch for minor version numbers every once in a while if they decide to use CLDR. Other possible problems may be it being in XML and having both sideways and directorywise hierarchical inheritence (which would make it hard to parse at first), and then having borrowed parts of ICU syntax in the fine print, which would need you to either borrow lots of code from ICU or implement logic to parse ugly syntax like "0≤Rf|1≤Ru|1<Re" or "# ##0.00 ¤".

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!