20 Feb 2004 raph   » (Master)


I'll be at CodeCon this Friday and Saturday. I'm really looking forward to meeting up with my tribe. As an extra bonus, Heather and the kids are coming to pick me up Saturday afternoon, so it'll be a good opportunity for people in my tribe to meet my family.

Advogato maintenance

lkcl is absolutely right that I need to hand over the regular maintenance of the site. I'll start up a recruitment thread on virgule-dev after I get back from CodeCon.

The spammers are winning

When Paul Graham first published his Plan for Spam, I thought it fairly likely that his basic idea was sound, and that Bayesian-style classification would, if not eliminate spam, then at least make it manageable. Now I'm sure it won't work.

The problem is that there are two attacks that spammers can do, both of which are damned hard to do anything about. First, they can include bits of legitimate, high quality text in their messages [for example, I now see wikipedia text in Google-spam sites]. Second, by running as a virus-zombie, they can take over whatever authentication tokens are available for real mail. Note that this includes hashcash.

I still believe that a trust metric can be a part of a healthy balanced breakfast to end spam. But Google, the highest-profile deployment of a trust metric, seems to be experiencing a marked decline in quality, to the point where some people are questioning whether PageRank is such a good idea after all.

I don't know how to fix PageRank, but my current thinking, reinforced by my experiences here, is that negative recommendations are needed too. I feel quite empowered by the trust that Google places in me to rank a page up, but at the same time helpless to tell Google, "this site is pure spam".

Negative recs are not easy. For one, they don't really fit well in the PageRank model (the biggest difference between PageRank and the eigenvector-based tmetric used to power the diary ratings). Second, any simplistic approach (such as having an unauthenticated (or underauthenticated) "report this as spam" link) would be very vulnerable to DoS-type attacks, likely rendering the whole negative rec process unusable. There's a good technical reason to prefer monotonic trust metrics, and when I started my thesis project I concentrated on those, but I now think that their inability to use negative recs is crippling.

And, Zaitcev, I did notice that Orkut's trust model seemed quite primitive. I didn't kick its tires carefully, but I didn't see any obvious differences from Friendster's, which is simply based on the existence of a path of length no more than four. If Orkut is at CodeCon, I'm going to pin him down and see what he has to say for himself.

See you all at CodeCon!

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!