Older blog entries for Rhys (starting at number 3)

Finally managed to get some thoughts together on the spam/non-spam issue, a mere fortnight behind pretty much everybody else.

I've focused on the corpus collection side of things, since I worked on the SpeechDat(II) project for a while (the link via the Welsh flag on that page is long down, sorry). I could've written more about lexical model adaptation, but chose not to in the end.

Anyway, here's a link to what I wrote. Comments appreciated.

I have this account's passphrase back (it was obvious when I saw it, but then these things always are like that I guess). Thanks to Telsa and to yosh for their help.

I've been wondering about the way that the current group of probabilistic spam-filters, from Vipul's Razor via spamassassin to those inspired by Paul Graham's work, actually collect their spam/non-spam corpuses, and, where appropriate, adapt their n-gram and other lexical analyses. I'm putting that here in order to embarrass myself into writing something about it in the very near future.

A lot's happened in the past month. My PhD grinds on, very slowly - current deadline for completion is March 31st. I have a Real Job for when I finish that. And I've almost completely neglected Advogato (sorry), but I'm glad I'm not a Journeyer any more. Later then...

22 Dec 2000 (updated 23 Dec 2000 at 08:05 UTC) »


Posted a first article. <strike>Please be gentle :)</strike>

Posted a rather firmer riposte to first article.

Now I'm pleading with people to read my two home pages before going for the minimalist jugular.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!