16 Feb 2005 TazForEver   » (Journeyer)

sizeof troubles

I always took for granted that my name -- Benoît -- was 6 chars long. I was wrong.
It's 7, UTF-8 speaking.
It took me some time this afternoon to understand why sizeof "Benoît" == 8 where i would expect 7. So i hexdump'ed my C file and realized that î is encoded as 0xC3AE. I'm glad that ASCII chars are still encoded on a single byte in UTF-8 so hacks like this one:
char buf[magic]; /* enough to hold "plop" */
are still 0k. I'll try to be less lazy and always code :
char buf[sizeof "plop"];

WTF is î ?
U+00EE LATIN SMALL LETTER I WITH CIRCUMFLEX
UTF-8 : 0xC3 0xAE
In French, circumflexes '^' on vowels often replace old French 's' :
- hôpital for hospital
- hôtel for hostel (= hotel)
- Benoît for Benoist
- côte for coste (= coast)
- etc
Benoît is the french for Benedict.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!