xach figured it out: the person analyzing spam is, indeed, Paul Graham. In fact, he has a new piece up.
I'm writing something more detailed about trust now, filling out some email conversation that we've started. However, I'm struck by one feature of his new scheme, so I'll just blog my response here:
Very good. I do believe you're on to something here. The idea of everybody building their own corpus _is_ powerful. Your analysis makes sense to me, and I do believe your tool will do better than most.I take issue with one thing, though: your assertion that probabilities are superior to scores. Most people not trained in statistics will find adding up of scores more transparent than computing Bayesian probabilities. Further, you've got an awful lot of voodoo constants in there: 2x, 0.01, 0.99, 0.20, 0.9. Lastly, the Bayesian probability computation assumes all the probabilities are independent (if I recall my stats correctly), which is definitely not valid in this application.
In fact, I do believe that your probabilities and SpamAssassin-like scores are equivalent. Use the transform: score = log p - log (1-p), or p = exp(score) / (1 + exp(score)). Take the 15 scores with greatest absolute values (see, already a more intuitive formulation), and simply add them. Your voodoo scores are now -4.6, 4.6, -1.4, and 2.2, respectively.
Perhaps the best way to look at this is that Bayesian probability can give theoretical justification to a "score" system. That might be interesting to some.
