22 Dec 2009 joolean   » (Journeyer)


I just pushed a patch for Guile that extends the unreleased 2.0 branch's Unicode support to include title case, as described in the Case Mappings section of the Unicode Standard. It's kind of complicated: In the context of characters, it's used with digraph characters (the canonical example being U+01F3 "dz") whose upcased form ("DZ") isn't appropriate for use at the beginning of word (where "Dz" would be a better fit).

What's interesting is that GNU libunistring, the Unicode library used by Guile defines the contract for uc_totitle such that it the function returns a special title case character, if one is defined for the specified character, otherwise it returns the upcased version of that character. In the context of strings, libunistring's title case mapping puts the first character of each word into title case as above and downcases all the other characters.

Guile has a set of predicates like char-lower-case?, which, under the hood, check for the presence of a specified character in a particular character set. In the original form of the patch, I had added a char-title-case? predicate which did the same for the title case character set. This led to situations in which

(char-title-case? (char-titlecase x))
would be false. We ended up taking it out.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!