31 Aug 2008 Stevey   » (Master)

There can be only one

When volume becomes high enough you start to observe patterns in SPAM pretty easily. I think that this is primarily because people like to see patterns, whether they are present or not.

The trick is determining whether they are real patterns or not, and then to a lesser extent whether they are useful patterns.

For example I host mail for a business domain. That means that incoming messages come primarily from existing customers, and very rarely from potential new ones.

In practise that means that email is expected to arrive from 9am til 6pm (+/-2hours) Email received at 2AM? Either it is somebody working remotely, a foreign contact, or much more likely it is SPAM.

Now clearly you cannot dump all messages received at unusual times of the day, but it is a surprisingly robust SPAM indicator for that particular domain.

All heuristics are fallable, but some are useful regardless..

I'd love to know what people can learn from their SPAM. This week I'm handling approximately 80,000 messages a day, per MX, which isn't huge (ie. 2-3 million a month).

ObQuote: Highlander

Syndicated 2008-08-31 12:03:10 from Steve Kemp's Blog

Latest blog entries     Older blog entries

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!