It is in pre-fascicle 1a, Bitwise Tricks and Techniques (PostScript, 1.1MiB): check the index for "Pournader, Roozbeh".
My contribution is a very small improvement on a UTF-8 bit manipulation trick I talked about in a former blog post.
This resulted in an optimization match by various people, which Behdad has somehow summarized here (read also the comments).
Fast forward to December 2006, when I was going over the new Unicode book and was trying to make sure Gnome and friends are Unicode compliant. One of the bugs I filed was this one, and some of the answers I received somehow discouraged me from continuing the effort which basically led me to stop the whole thing. (The bug is about getting rid of legacy support for an old version of UTF-8 which is now considered by the Unicode Standard to be a security problem.)
Then, last month I have been reading some draft material Donald Knuth is putting online, for his infamous Volume 4 of The Art of Computer Programming. One of the pre-fascicles he has put online is about Bitwise Tricks and Techniques, which I really enjoyed reading. Knuth, being a Unicode fan, had inserted some interesting excercises, regarding UTF-8 and UTF-16.
One of the exercises included a magic (!) formula to replace the utf8_skip_data array (see Federico's post again). It is provided in exercise 197.
Knuth's formula not only needs no memory reference, it's also branch-free (which is considered very good for many modern CPU architectures). The formula does it with four operations, which would become five when adapted to the present formulation used in glib. The only problem is that it only works for proper UTF-8, the version the Unicode Standard requires, but not glib's UTF-8.
I tried to extend Knuth's formula to glib's UTF-8, and did it on paper with two more operations (seven instead of five), using 64-bit boolean arithmetic.
After chatting with Behdad, he told me it's not really worth it to replace the array with the formula (I cannot understand the reasons well enough to explain them here, but I trust him), but he was interested in seeing my extended formula.
So last night, I tried to make sure my formula works fine before emailing it to Behdad. And I found a bug, which meant that I needed to add two more operations to get it done properly, a total of nine operations.
This is my new formula, which is tested and works fine. It may not provide exactly the same results as the utf8_skip_data array for all values, but many of the array's cells are redundant. For necessary cells, it provides the same results:
def utf8_skipper(c): t = (c >> 1)^0x7F return ((0x924900009201B128 >> ((t & ~(t >> 1))*3)) & 7)+1
Can you do it in less that nine operations? Or with 32-bit boolean arithmetic only? [With no branching or memory access, of course.]
This may just be a mental exercise, but please email me if you could, as I'm starting to feel an affection towards the problem!
So, I just wish to share my own Middle Eastern view. I just read the story on slashdot about a guy buying a hard drive, finding that it contains bathroom tiles, and having problems returning it. What I immediately thought was: “He did it himself!” I read more, and I saw that many people consider him honest and a victim, which was also what I came to after reading the comments.
The sad point is, unfortunately, lots of Iranian muslims lie very easily, and even enjoy it, although it’s something that’s very frowned upon in the scripture and the sayings. Even very religious muslims who own shops in the bazaar and contribute heavily to religious causes like building mosques, lie very easily about everything and even pose as victims. So in this case, I automatically thought that the victim had done it himself, because I had got used to such deceptive self-victimizations. Ah, before I forget, I believe President Ahmadinejad leads them all!
Not that the story has much similarity with Maz Jobrani’s, just that suddenly a Middle Easterner’s view of an event or a story may be so different from the rest of the world’s.
By the way, I also found a wonderful quote in the slashdot comments: “I was tired of North Korea’s harsh penalties for being a citizen. That’s why I moved to Iran!”
In other news, apparently access to my Persian weblog (not updated for quite a while) is blocked by some (but not all) Iranian ISPs. My mother-in-law found about it, when she was searching my name in Google, and then called Elnaz, my wife: “There were so many hits for Roozbeh, but his website was filtered!”
I will write about the adventurous Turkey trip as soon as I can. It’s quite a entertaining story, considering all the disasters that happened to us. Also more on OOXML later.
Student: I was wondering if you would please give us a copy of the FreeFoo software you just installed on the computer?And then Behnam explains again for half an hour how yum works.
Behnam: But I don’t have anything apart from the OS installation CDs, of which you have copies. The application yum does all the rest.
Student: But you just installed it! You should have a CD somewhere.
Behnam: It’s installed off the Internet, as I told you.
Student: [not believing him] Ah, still, would you give us a copy if you have one on you?
Among the most interesting parts of the trip, was going to the Babak Fort during a very mysty day when we couldn’t see more than five meters around. Michael remained in the car, and tried to decode the joining and shaping behavior of the Psalter Pahlavi script, a script that is only found in a fourth century CE twelve-page document (and also recently on a damaged cross found around Herat). We, the others, tried to find the path to the castle in the mist and was lost after twenty minutes, trying to find the coordinates of the castle by phoning friends and asking them to dig Google Earth and other friends with GPS devices but seeing no success, and then finally finding our way by Christian (our other Irish guest) hearing a loud radio playing in Azerbaijani which we followed and resulted in us getting found.
I’m looking forward to mapping the route to the castle on OSM.
My mom was at his death bed, and held him when he died. He was very well aware of what happened around him until the final moments, but had a hard time talking and moving in his last week. For all I know, he wished to die sooner, as he perhaps could not stand himself being weak. His mental image of himself was always a strong and clear-minded man, which was breaking after his illness after a few heart attacks he survived.
He is the nearest person I have lost to death. I tried my best to not look at his corpse, hoping that the avoiding will help me keep a better last image of him in my memory.
I had written here about him previously, on a note about Rumsfeld.
OSM’s infrastructure is great, but more mappers and developers are definitely needed and are quite welcome. Getting involved in the project is highly recommended if you love maps, own a GPS device, or like trying your hand on hacking map-related software like map renderers and route finders.
For us, it’s a unique opportunity to beat the proprietary providers’ inaccurate and out-of-date maps for the quickly changing Tehran which is full of oneway streets and dead end alleys, and release the first free maps of the huge city where we’ve been born and brought up.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!