Advogato: Blog for mhausenblas

Toying around with Riak for Linked Data

So I stumbled upon Rob Vesse’s tweet the other day, where he said he was about to use MongoDB for storing RDF. A week earlier I watched a nice video about links and link walking in Riak, “a Dynamo-inspired key/value store that scales predictably and easily” (see also the Wiki doc).

Now, I was wondering what it takes to store an RDF graph in Riak using Link headers. Let me say that it was very easy to install Riak and to get started with the HTTP interface.

The main issue then was how to map the RDF graph into Riak buckets, objects and keys. Here is what I came up so far – I use a RDF resource-level approach with a special object key that I called:id, which is the RDF resource URI or the bNode. Further, in order to maintain the graph provenance, I store the original RDF document URI in the metadata of the Riak bucket. Each RDF resource is mapped into a Riak object; for each literal RDF object value the literal value is stored directly via an Riak object-key, for each resource object (URI ref or bNode), I use a Link header.

Enough words. Action.

Take the following RDF graph (in Turtle):

@prefix foaf: <http://xmlns.com/foaf/0.1/>. @prefix : <http://sw-app.org/mic.xhtml#>.

:i foaf:name "Michael Hausenblas" ;
foaf:knows <http://richard.cyganiak.de/foaf.rdf#cygri> .

To store the above RDF graph in Riak I would then using the following curl commands:
curl -X PUT -d 'Michael Hausenblas' http://127.0.0.1:8098/riak/res0/foaf:name
curl -X PUT -d 'http://sw-app.org/mic.xhtml#i' http://127.0.0.1:8098/riak/res0/:id
curl -X PUT -d 'http://richard.cyganiak.de/foaf.rdf#cygri' http://127.0.0.1:8098/riak/res1/:id
curl -X PUT -d 'http://sw-app.org/mic.xhtml#i' -H "Link: </riak/res1/:id>; riaktag=\"foaf:knows\"" http://127.0.0.1:8098/riak/res0/:id

Then, querying the store is straight-forward like this (here: list all people I know)
curl http://127.0.0.1:8098/riak/res0/:id/_,foaf:knows,_

Yes, I know, the prefixes like foaf: etc. need to be taken care of (but that’s rather easy, can be put in the bucket’s metadata as well, along with the prefix.cc service. Further, the bNodes might cause troubles. And there is no smushing via owl:sameAs or IFPs (yet). But the most challenging area is maybe how to map a SPARQL query onto Riak’s link walking syntax.

Thoughts, anyone?

Filed under: Experiment, Linked Data

Syndicated 2010-10-14 15:18:03 from Web of Data

14 Oct 2010 mhausenblas » (Journeyer)