YouTube/Viacom privacy followup (and what Google should do)
A brief update on the YouTube/Viacom privacy disaster.
From Ellen Nakashima in the Washington Post:
Yesterday, lawyers for Google said they would not appeal the ruling. They sent Viacom a letter requesting that the company allow YouTube to redact user names and IP addresses from the data.
“We are pleased the court put some limits on discovery, including refusing to allow Viacom to access users’ private videos and our search technology,” Google senior litigation counsel Catherine Lacavera said in a statement. “We are disappointed the court granted Viacom’s overreaching demand for viewing history. We will ask Viacom to respect users’ privacy and allow us to anonymize the logs before producing them under the court’s order.”
I’m pleased to read that Google are trying to keep identifying information out of this (vast) dataset.
Viacom claim to want this data to “measure the popularity of copyrighted video against non-copyrighted video” (in the words of the Washington Post article; I don’t have a direct quote handy).
If that is the case, I suggest their needs could be met with a practical compromise. Google should make a public domain data dump summarising the (already public) favouriting history of each video (with or without reference to users, whose identifiers could be scrambled/obscured). This addresses directly the Viacom demand while sticking to the principle of relying on the public record to answer Viacom’s query. Only if the public record is incapable of answering Viacom’s (seemingly reasonable) request should users private behaviour logs be even considered. Google should also make use of their own Social Graph API to determine how many YouTube usernames are already associated in the public Web with other potentially identifying profile information; those usernames at least should not be handed over without at least some obfuscation.
If we know which YouTube videos are copyrighted (and Viacom owned). And we know how long they’ve been online, and which ones have been publicly flagged as ‘favourites’ by YouTube users, we have a massively rich dataset. I’d like to see that avenue of enquiry thoroughly exhausted before this goes any further.
Nearby in the Web: Danny Weitzner has blogged further thoughts on all this, including a pointer to a recent paper on information accountability, suggesting a possible shift of emphasis from who can access information, to the acceptable uses to which it may be put.
Referata, a Semantic Media Wiki hosting site
From Yaron Koren on the semediawiki-users list:
I’m pleased to announce the release of the site Referata, at referata.com: a hosting site for SMW-based semantic wikis. This is not the first site to offer hosting of wikis using Semantic MediaWiki (that’s Wikia, as of a few months ago), but it is the first to also offer the usage of Semantic Forms, Semantic Drilldown, Semantic Calendar, Semantic Google Maps and some of the other related extensions you’ve probably heard about; Widgets, Header Tabs, etc. As such, I consider it the first site that lets people create true collaborative databases, where many people can work together on a set of well-structured data.
See announcement and their features page for more details. Basic usage is free; $20/month premium accounts can have private data, and $250/month enterprise accounts can use their own domains. Not a bad plan I think. A showcase Referata wiki would help people understand the offering better. In the meantime there is elsewhere a list of sites using Semantic MediaWiki. That list omits Chickipedia; we can only wonder why. Also I have my suspicions that Intellipedia runs with the SMW extensions too, but that’s just guessing. Regardless, there are a lot of fun things you could do with this, take a look…
YouAndYouAndYouTube: Viacom, Privacy and the Social Graph API
From Wired via Thomas Roessler:
Google will have to turn over every record of every video watched by YouTube users, including users’ names and IP addresses, to Viacom, which is suing Google for allowing clips of its copyright videos to appear on YouTube, a judge ruled Wednesday.
I hope nobody thought their behaviour on youtube.com was a private matter between them and Google.
The Judge’s ruling (pdf) is interesting to read (ok, to skim). As the Wired article says,
The judge also turned Google’s own defense of its data retention policies — that IP addresses of computers aren’t personally revealing in and of themselves, against it to justify the log dump.
Here’s an excerpt. Note that there is also a claim that youtube account IDs aren’t personally identifying.
Defendants argue that the data should not be disclosed because of the users’ privacy concerns, saying that “Plaintiffs would likely be able to determine the viewing and video uploading habits of YouTube’s users based on the user’s login ID and the user’s IP address” .
But defendants cite no authority barring them from disclosing such information in civil discovery proceedings, and their privacy concerns are speculative. Defendants do not refute that the “login ID is an anonymous pseudonym that users create for themselves when they sign up with YouTube” which without more “cannot identify specific individuals”, and Google has elsewhere stated:
“We . . . are strong supporters of the idea that data protection laws should apply to any data that could identify you. The reality is though that in most cases, an IP address without additional information cannot.” — Google Software Engineer Alma Whitten, Are IP addresses personal?, GOOGLE PUBLIC POLICY BLOG (Feb. 22, 2008)
So forget the IP address part for now.
Since early this year, Google have been operating an experimental service called the Social Graph API. From their own introduction to the technology:
With so many websites to join, users must decide where to invest significant time in adding their same connections over and over. For developers, this means it is difficult to build successful web applications that hinge upon a critical mass of users for content and interaction. With the Social Graph API, developers can now utilize public connections their users have already created in other web services. It makes information about public connections between people easily available and useful.
Only public data. The API returns web addresses of public pages and publicly declared connections between them. The API cannot access non-public information, such as private profile pages or websites accessible to a limited group of friends.
Google’s Social Graph API makes easier something that was already possible: using XFN and FOAF markup from the public Web to associate more personal information with YouTube accounts. This makes information that was already public increasingly accessible to automated processing. If I chose to link to my YouTube profile with the XFN markup rel=’me’ from another of my profiles, those 8 characters are sufficient to bridge my allegedly anonymous YouTube ID with arbitrary other personal information. In a machine-readable manner, that Google have already demonstrated a planet-wide index of.
Here is the data returned by Google’s Social Graph API when asking for everything about my YouTube URL:
<small></small>
<small>{
“canonical_mapping”: {
“http://youtube.com/user/modanbri”: “http://youtube.com/user/modanbri”
},
“nodes”: {
“http://youtube.com/user/modanbri”: {
“attributes”: {
“url”: “http://youtube.com/user/modanbri”,
“profile”: “http://youtube.com/user/modanbri”,
“rss”: “http://youtube.com/rss/user/modanbri/videos.rss”
},
“claimed_nodes”: [
],
“unverified_claiming_nodes”: [
"http://friendfeed.com/danbri",
"http://www.mybloglog.com/buzz/members/danbri"
],
“nodes_referenced”: {
},
“nodes_referenced_by”: {
“http://friendfeed.com/danbri”: {
“types”: [
"me"
]
},
“http://guttertec.swurl.com/friends”: {
“types”: [
"friend"
]
},
“http://www.mybloglog.com/buzz/members/danbri”: {
“types”: [
"me"
]
}
}
}
}
}
</small>
<small></small>
You can see here that the SGAPI, built on top of Google’s Web crawl of public pages, has picked out the connection to my FriendFeed (FOAF) and MyBlogLog (FOAF) accounts, both of whom export XFN and FOAF descriptions of my relationship to this YouTube account, linking it up with various other sites and profiles I’m publicly associated with.
YouTube users who have linked their YouTube account URLs from other social Web sites (something sites like FriendFeed and MyBlogLog actively encourage, are no longer anonymous on YouTube. This is their choice. It can give them a mechanism for sharing ‘favourited’ videos with a wide circle of friends, without those friends needing logins on YouTube or other Google services. This clearly has business value for YouTube and similar ’social video’ services, as well as for users and Social Web aggregators.
Given such a trend towards increased cross-site profile linkage, it is unfortunate to read that YouTube identifiers are being presented as essentially anonymous IDs: this is clearly not the case. If you know my YouTube ID ‘modanbri’ you can quite easily find out a lot more about me, and certainly enough to find out with strong probability my real world identity. As I say, this is my conscious choice as a YouTube user; had I wanted to be (more) anonymous, I would have behaved differently. To understand YouTube IDs as being anonymous accounts is to radically misunderstand the nature of the modern Web.
Although it wouldn’t protect against all analysis, I hope the user IDs are at least scrambled before being handed over to Viacom. This would make it harder for them to be used to look up other data via (amongst other things) Google’s own YouTube and Social Graph APIs.
Drupal is now OAuth-enabled
via Sumit Kataria on the oauth list:
I am very happy to announce that Drupal’s OAuth module is now ready to use. Right now it just acts as server because we don’t need client support at this time, Client implementation will be done soon by the time release of Drupal 7 as ServicesAPI in Drupal 7 will be needing it. Endpoints for Drupal are:
REQUEST URL : http://www.example.com/?q=oauth/request
AUTH URL : http://www.example.com/?q=oauth/auth
ACCESS : http://www.example.com/?q=oauth/access
At this time Services API is necessary to use OAuth (also it is the only module which is gonna use oauth as well - so not a big deal). This module also provides a test browser to produce OAuth tokens (whole Drupal way using multi-page form). Right now only PLAIN-TEXT signature method is supported but soon support for other methods will be added as well.
I’m slowly learning my way around Drupal, so this is rather good encouragement to learn faster…
See the announcement for more details and links.
Inevitable Nipple Analogy
“A genetic theory of homosexuality.” by William Saletan in yesterday’s Slate suggests an inevitable analogy.
The article reports on recent work (pdf) addressing the ‘if homosexuality is genetic, why hasn’t it died out?’ debate, which suggests that the ‘gene for male homosexuality persists because it promotes—and is passed down through—high rates of procreation among gay men’s mothers, sisters, and aunts‘.
In other words, gayness in men can be as natural and as the male nipple, even if both are initially puzzling when thought of in evolutionary terms. OK I’m stretching things slightly, but I can’t help but wonder whether the nipple analogy might be a good basis for informal arguments for a bit more tolerance:
Let He Who is Without Nipples Cast the First Stone?
More on the Great Nipple Question from straightdope.com; a moment of science; and the evolution-101 blog.
Beautiful plumage: Topic Maps Not Dead Yet
Echoing recent discussion of Semantic Web “Killer Apps”, an “are Topic Maps dead?” thread on the topicmaps mailing list. Signs of life offered include www.fuzzzy.com (’Collaborative, semantic and democratic social bookmarking’, Topic Maps meet social networking; featured tag: ‘topic maps‘) and a longer-list from Are Gulbrandsen suggesting a predictable hype-cycle dropoff is occuring, as well as a migration of discussions from email into the blog world. For which, see the topicmaps planet aggregator, through which I indirectly find Steve Pepper’s blog and an interesting post on how TMs relate to RDF, OWL and the Semantic Web (though I’d have hoped for some mention of SKOS too).
Are Gulbrandsen also cites NZETC (the New Zealand Electronic Tech Centre), winner of The Topic Maps Application of the year award at the Topic Maps 2008 conference; see Conal Tuohy’s presentation on Topic Maps for Cultural Heritage Collections (slides in PDF). On NZETC’s work: “It may not look that interesting to many people used to flashy web 2.0 sites, but to anybody who have been looking at library systems it’s a paradigm shift“.
Other Topic Map work highlighted: RAMline (Royal Academy of Music rewriting musical history). “A long-term research project into the mapping of three axes of musical time: the historical, the functional, and musical time itself.”; David Weinberger blogged about this work recently. Also MIPS / Institute for Bioinformatics and Systems Biology who “attempt to explain the complexity of life with Topic Maps” (see presentation from Volker Stümpflen (PDF); also a TMRA’07 talk).
Finally, pointers to opensource developer tools: Ruby Topic Maps and Wandora (Java/GPL), an extraction/mapping and publishing system which amongst other things can import RDF.
Topic Maps are clearly not dead, and the Web’s a richer environment because of this. They may not have set the world on fire but people are clearly finding value in the specs and tools, while also exploring interop with RDF and other related technologies. My hunch is that we’ll continue to see a slow drift towards the use of RDF/OWL plus SKOS for apps that might otherwise have been addressed using TopicMaps, and a continued pragmatism from tool and app developers who see all these things as ways to solve problems, rather than as ends in themselves.
Just as with RDFa, GRDDL and Microformats, it is good and healthy for the Web community to be exploring multiple similar strands of activity. We’re smart enough to be able to flow data across these divides when needed, and having only a single technology stack is I think both intellectually limiting, socially impractical, and technologically short-sighted.
Microsoft buying Powerset
Allegedly. Powerset are natural language processing specialists. See also last year’s ISWC talk from CTO Barney Pell, “Natural Language and the Semantic Web”, discussions with Barney from last month’s Talis Semantic Web Gang chat, and earlier commentary from Paul Miller.
RDFa Basics video from Manu Sporny
Via Dave Beckett in #swig IRC, Manu Sporny’s handy 10 minute overview of RDFa Basics (see also other versions, source materials).
Here’s a screen grab of the full FOAF example used. Note that the WG renamed ‘instanceof’ to ‘typeof’ recently.
For the video-averse, a full transcript is available. Here’s the full XHTML markup example from the above image:
<body xmlns:foaf=”http://xmlns.com/foaf/0.1/”><span about=”#jane” typeof=”foaf:Person” property=”foaf:name“>Jane McJanerson</span><span about=”#mac” typeof=”foaf:Person” property=”foaf:name“>Mac McJanerson</span><span about=”#jane” rel=”foaf:knows” resource=”#mac”>Jane is friends with Mac.</span></body>
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!