5 Feb 2007 slamb   » (Journeyer)

ncm, fejj

The data structure you're talking about is called a Bloom filter. It seems to be one that I see and think "oh, that'll come in handy sometime" but then it never does. The nastiest limitation is that you can't ever remove anything from it. So if your web crawler should eventually rescan the same URL (as I hope it would), it'd be unsuitable for hashing URLs. I confess that I don't really understand what the interview question is getting at.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!