9 Sep 2002 jwb   » (Journeyer)

tk: I have tried using 2-grams and 3-grams in my spam filter. Of course this tends to bloat the vocabulary and therefore the time required for analysis. Later I will attempt to characterize the effect of term length on filter performance. My parser it something of a hack, so any extensions to the term length will probably result in repulsive code.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!