15 Feb 2004 Fefe   » (Master)

ARGH! I vastly underestimated the relative size of the strings in the LDAP database and the offsets to the strings. The ratio is about 2:1. I.e. for two megs of raw data, I need one more meg for the offsets. And that is not counting the indices (which are much smaller).

parse still can't handle the phone CD. The reason is that Linux will not let me mmap more than 2 gigs. This probably has to do with some stupid compile time constants like the relative position of code, stack, reserved kernel space and the mmap space. Or so. I'm just pulling this out of thin air, sorry. So if there ever was a compelling reason for 64-bit computing, this would be it. Time to get myself an Athlon 64 box. Anyone have a spare one for me? *sigh*

On the other hand, I could have foreseen it. I haven't counted them, but I expect the phone CD to carry at least 20 million records, which means at least 80 megs just for the pointers to the records. This is a lot of data. Maybe I can think of an easy way to be more space efficient. Let's sleep over it. Comments or ideas welcome.

File systems keep pointer sizes down by segmenting the data into smaller independent chunks. Maybe I should do the same. The data could be encoded with a static Huffman tree. Bad tradeoff here. Maybe I will just have to settle for the data for Berlin and Brandenburg or so. Too bad. I really looked forward to this.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!