29 Aug 2002 raph   » (Master)

Spam

Paul Graham has a new piece on spam is different. He also includes my comment in response.

I have great hopes for word-based (or "Bayesian") scoring. Paul's arguments for why it will work seem convincing to me. In particular, keeping per-user word lists should help enormously with two big problems: the ability for spammers to "optimize" around a common scoring scheme; and differing opinions about what constitutes spam.

I still think there may be a role for trust, but it's also possible that scoring by itself will work so well that adding trust isn't really necessary. In any case, I'll do my best to write up my ideas for using trust to beat spam.

trust

Bram and I had a great discussion about anti-certs. He is exploring the space of adding anti-certs to the trust computation, but is finding it complex. Many of the proposals would seem to require vastly greater computation than the simple eigenvector and network flow models I've proposed.

The leading alternative to that seems to be a way to use anti-certs as input to a process which removes positive certs from the graph. If you disagree with the ratings of user X, it might be interesting to analyze the influence of X on ratings transmitted to you through the graph, then remove your local edges which carry the most influence from X. In general, this boils down to optimizing edge weights. Lastly, anti-certs don't have be about individual users (nodes in the graph). They can be about ratings you disagree with. You don't really have to know where the bogus ratings come from, as long as you know how to tune your local edges to minimize them.

As always, a big part of the challenge is presenting a sensible UI. I've made the Advogato UI for certs as simple as I know how, and the user community is supposed to be sophisticated, yet it seems that many people can't manage to do certs and ratings the way they're supposed to be. Bram's anti-cert UI is straightforward to implement. In addition to "Master, Journeyer, and Apprentice", you'd just add one or more negative categories.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!