10 Sep 2009 pjf   » (Journeyer)

Dark Stalking on Facebook
For a while I've been using Facebook's API and Facebook Query Language (FQL) via Perl's WWW::Facebook::API module to run fairly innocent queries on my friends. If I visit a town, I'd like a reminder of who lives there. If I want to go rock-climbing, it helps if I can easily search to see which of my friends share that hobby. This is good, innocent stuff, and makes me glad to be a developer.

Last week I decided to play with event searches. If a large number of my friends are attending an event, there's a good chance I'll find it interesting, and I'd like to know about it. FQL makes this sort of thing really easy; in fact, finding all your friends' events is on their Sample FQL Queries page.

Using the example provided by Facebook, I dropped the query into my sandbox, and looked at the results which came back. The results were disturbing. I didn't just get back future events my friends were attending. I got everything they had been invited to: past and present, attending or not.

I didn't sleep well that night. I didn't expect Facebook to share past event info. I didn't expect it to share info when people had declined those events. I haven't found any way of retrieving friends' past events using Facebook's website, but using FQL made it easy. Somehow, implicitly, I thought old events would fade away, only viewable to those who already knew about them. I didn't expect them to stick around for my code to harvest, potentially years into the future.

Finding my friends' old events crossed a moral boundary I honestly didn't expect to encounter. Without intending, I really felt like I was snooping. It didn't matter that these friends had agreed to share this information under the Facebook terms and conditions. I would personally feel uncomfortable with this much information being so readily available, and assume my friends would feel the same.

However my accidental crossing of moral boundaries wasn't the only thing that kept me awake last night. I was also kept awake by wondering just how much information could I tease out of the Facebook API. What could I discover? What if I were evil?

However I'm not evil, so I put my code on hold for a while and made a call for volunteers. I'd be restricting myself to just using the Facebook API, and without them installing any additional applications. I wouldn't share their data in any way, but I'd be able to inspect and use it, and would try to provide them with a copy when I was done. To be honest, I was surprised by the response; I now have almost two dozen people who have agreed to participate, covering a wide range of lifestyles and privacy settings.

The results have been very interesting. I expected to be able to obtain personal information, including things like events, photographs, and friends; it doesn't take much imagination with the FQL tables to find those. What was most interesting are some of the more creative queries I was able to run.

Most recently, I've been able to obtain status feeds, even for users who have very tight privacy settings, although I had to tweak my own application's privileges to do so. I don't know how far into the past these go, but they also come with likes information, and comments. This gives me a wealth of information on the strength and types of relationships people have. A person who comments a lot on another user's posts probably finds that user interesting. If I descended into keyword and text analysis, I may even be able to determine how they find that user interesting.

But by far the most interesting part of all of this have been dark users. Like dark matter, these users are not directly observable, usually because they've completely disabled API access. In fact, some of these users are completely dark unless you're a friend. They don't show up in search results. They don't show up on friends' lists. You can't send them messages. If you try to navigate to their user page (assuming you know it exists), you get redirected back to your homepage. These users have their privacy settings turned up real high, and are supposed to be hard to find.

However like dark matter, dark users are observable due to their effects on the rest of the universe. If a dark user comments on a stream entry, I can see that comment. More importantly, I can see their user-ID, and I can generate a URL to a page that will contain their name. I can then watch for their activities elsewhere. Granted, I can't directly search for their activity, but I can observe their effects on my friends. For want of a better term, I've been calling this "dark stalking".

What makes this all rather chilling is that I'm doing all of this via the application API. If your friend has installed an application, then it can access quite a lot of information about you, unless you turn it off. If your friend has granted the application the read_stream privilege, then it can read your status stream. Even if a friend of a friend has done this, and you comment on your friend's status entries, it's possible to infer your existence and retrieve those discussions through dark stalking.

While I've always considered people's own carelessness to be the biggest threats to their own privacy, in the social 2.0 world it seems we need to be increasingly worried about our friends, too!

I'm preparing a detailed paper with the results of my research (which is still ongoing), but I will be presenting my preliminary findings at BarCampMelbourne, this weekend (11-12th September 2009), with a further update at the University of Tasmania Computing Society (TUCS) on the 2nd October. A conference talk will invariably follow.

If you want to keep track of my research, then you can join the facebook group, or the facebook privacy group. I prefer comments and questions to directly to the facebook privacy group, or to me directly.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!