FOAF-based whitelisting for email
Posted 26 Mar 2007 at 21:14 UTC by kjetilk 
There are at least 20 million FOAF profiles out there, from Advogato, LiveJournal, Opera
Community, personal
hand-maintained files etc. In many cases, the FOAF is also an
export of the social network, i.e. they are statements that one user
knows another.
I don't know any spammers, nor do any of my friends. Besides,
spammers tends to use random MAIL FROM addresses anyway, so if any of my
friends, and their friends, actually, pretty much anyone that is
reachable by following that social network originating from me, is
sending me an email, it is pretty safe to accept it.
We should put these data to good use. As a part of a concerted effort
to bring running Semantic Web code to the masses, I initiated a
project that will develop software to compute a simple
unidimensional trust metric, primarily based on network distance and
make it available. Furthermore, we will develop plugins for SpamAssassin and qpsmtpd to use the data. In
fact, minimal plugins are allready developed, so what remains is to
compute the trust metrics and define how the plugins can access the
computed trust metrics as well as create a scalable system where the
metrics can be queried.
The project is using the foaf-dev
mailing list for discussion. The initial discussion focused on
the meaning of "trust" as well as the importance of topical
trust. In particular, our friends from the Konfidi project has designed an
elaborate trust system, relying in part on FOAF, in part on topical
trust and strengthened the system with PGP. Clearly, the statement
that you know someone does not imply that you certify that the
person will not spam. Also, that I trust my climbing mate with my
life when climbing does not mean that I'll tell him my root
password.
However, it was resolved that we should only require the data
allready on the web, and even if that means very little topical
trust data is out there, it is most likely good enough to be
useful for the task of whitelisting your network's email through.
I do not plan to deny all the email from senders not in my social
network myself. It is just one of many anti-spam measures, some are cheap,
some are heavy. I expect this to ease the load on my mail servers. I
also plan to do OCR on emails
containing images. Some argue that they would gladly drop all image
emails from anyone not in their addressbook, and so a one-hop social
network would suffice. It is not hard to come up with a use case that
would make this look silly: Just think about the girl you met at a
party last night: She got your email, but you had not yet entered hers
into your addressbook, and now she's sending you her picture... If she
wasn't in your addressbook, it is not unlikely she is in your social
network. I personally will use it to ease the spam-scanning loads, but
you are of course free to use it as you see fit.
There are several anticipated attacks on this system. Just sneaking in
fake email addresses will not mean much, but inserting fake
knows-statements will. We will need to be careful about the sources we
have for those. Also, there will be some (natural) supernodes, and if
spammers starts to use their addresses in their MAIL FROMs, it will
become a big annoyance for them. Thus, strengthening the system with
OpenID, SPF, DomainKeys or even PGP may be needed, either as a
modifier to the trust metric, or implemented as a part of plugins to
mail systems.
We would like to take advantage of the experiences that Advogato has
gained from years of work on trust metrics, we can always use a hand
with development, and we would of course want to use the FOAF data
from Advogato.
If you can't trust that the FROM address isn't forged, then any verification derived from the address is meaningless. End of story.
Similarly, if anyone in the FOAF chain is compromised, it all crashes down because the the trust vector becomes the transmission vector. This is often the case with virus/worm-laden mail.
inverse function, posted 26 Mar 2007 at 22:15 UTC by lkcl »
(Master)
quantum mechanics shows that the inverse wave function is essential.
therefore, just like you say, pizza - it's not enough to have a one-way link: you also need to have a weighting for the link the other way, too.
not only _that_, but the "context" in which the FOAF linking is known is _also_ important, so now you have a weighting thrown in for that.
not only _that_, but also it is important to factor in as many different "contexts" - different FOAF sources - as possible, giving weights for each one.
in fact, you probably don't want weights for each one, at all, but you want to "normalise" it out, given the total number of FOAF sources available.
so. you have a normalised vector which gives weights to all of the FOAF sources (again, i believe that's from quantum mechanics - wave functions)
you then perform a "filter" function "on how many of these FOAF things from all the sources do we agree"?
and that becomes your total "spam" weight, times the total possible allowed weighting.
very very straightforward.
and very cool.
summary, posted 26 Mar 2007 at 22:17 UTC by lkcl »
(Master)
normalised weighting of FOAF source "strength".
filtered percentage agreement of individual FOAF "agreement"
times SPAM weighting.
equals score.
hurrah.
eh., posted 27 Mar 2007 at 05:53 UTC by ncm »
(Master)
I get spam claiming to be from people I know all the time. It doesn't
mean their machine is compromised. Rather, somebody who has sent them
mail, or got mail from them, was compromised. The spambot harvested
the "To:" and "From:" addresses from that other person's mailbox. The
names I recognize are all people who are active on multiple mailing
lists, so knowing who they are doesn't help in discovering who was
compromised.
ncm and Pizza: we are fully aware of that problem, thus the comment about SPF, DomainKeys and PGP etc. It isn't a very simple problem, since people don't like very strong stuff like PGP, but according to some people with lots of experience from SpamAssassin say that the support for SPF is getting sufficiently good to be relied on. I never liked SPF, but that's life...
what's the one thing that email can't do?
it can't be used to communicate, two-way, on complex issues.
it's just not possible. especially when there are a number of people involved.
our minds cannot remember enough from one email message to the next.
only _very_ intelligent individuals, with _extremely_ good memories, can use email to successfully communicate, and even then, only on issues which do not require diagrams, hand-waving, emotion, proper emphasis...
... and what was the _first_ thing that microtoss added to email? rich text formatting, of course. embedded html for "outsiders".
and, now that "rich content" is here, what's the _single_ largest source of problems?
xxxxing microsoft outlook depressed email.
i was going to say something clever like "in time, when history looks back..." but it's so blatantly obvious that email is _the_ biggest communications failure that humanity has ever invented.
so - whilst efforts to improve email communication and reduce spam etc. are laudable, i question even the _usefulness_ of any such efforts.
(btw i'm not criticising your work: i realise we still have to have email...)
Spammers dont go round telling people that they are spammers. They usually deny it strongly, mainly cause they would get beaten to a pulp on the spot.
Im not a spammer.