17 Oct 2003 mau   » (Observer)

Bayesian spam filters solved the spam problem for me. But I don't like the use of a big (X MB) and slow database for word counts. (And I want that deleted emails leave no traces on disc.) So I modified some bloom filter code I had around to classify my emails. While this code is still untuned and uses a stupid shell script as tokenizer, the first results look promising. Two 64 KB bit-arrays are more than enough information to detect spam emails.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!