27 Sep 2010 mhausenblas   » (Journeyer)

Linked Enterprise Data in a nutshell

If you haven’t been living under a rock for the last weeks you might have noticed a new release of the LOD cloud diagram with some 200 datasets and some 25 billion triples. Very impressive, one may think, but let’s not forget that publishing Linked Data is not an end in itself.

So, I thought, how can I do something useful with the data and I ended up with a demo app that utilizes LOD data in an enterprise setup: the DERI guide. Essentially, what it does is telling you where in the DERI building you find an expert for a certain topic. So, if you just have some 5min, have a look at the screen-cast:

Behind the curtain

Now, let’s take a deeper look how the app works. So, the objective was clear: create a Linked Data app using LOD data with a bunch of shell scripts. And here is what the DERI guide conceptually looks like:

I’m using three datasets in this demo:

All that is needed, then, is an RDF store (I chose 4Store; easy to set up and use, at least on MacOS) to manage the data locally and a bunch of shell scripts to query the data and format the result. The data in the local RDF store (after loading it from the datasets) typically looks like this:

The main script (dg-find.sh) takes a term (such as “Linked Data”) as an input, queries the store for units that are tagged with the topic (http://dbpedia.org/resource/Linked_Data), then pulls in information from the FOAF profiles of the matching members and eventually runs it through an XSLT to produce a HTML page that opens in the default browser:

clear
echo "=== DERI guide v0.1"
echo "Trying to find people for topic: "$1

topicURI=$( echo "http://dbpedia.org/resource/"$1 | sed 's/ /_/')

curl -s --data-urlencode query="SELECT DISTINCT
 ?person WHERE { ?idperson <http://www.w3.org/2002/07/owl#sameAs> ?person ;
<http://www.w3.org/ns/org#hasMembership> ?membership .
?membership <http://www.w3.org/ns/org#organization> ?org .
?org <http://www.w3.org/ns/org#purpose> <$topicURI> . }"
http://localhost:8021/sparql/ > tmp/found-people.xml
webids=$( xsltproc get-person-webid.xsl tmp/found-people.xml )

echo "<h2>Results for: $1</h2>" >> result.html
echo "<div style='padding: 20px; width: 500px'>" >> result.html
for webid in $webids
do
foaffile=$( util/getfoaflink.sh $webid )
echo "Checking <"$foaffile"> and found WebID <"$webid">"
./dg-initdata-person.sh $foaffile $webid
./dg-render-person.sh $webid $topicURI
done
echo "</div><div style='border-top: 1px solid #3e3e3e;
 padding: 5px'>Linked Data Research Centre, (c) 2010</div>" >> result.html

rm tmp/found-people.xml
util/room2roomsec.sh result.html result-final.html
rm result.html
open result-final.html

The result for the example query ./dg-find.sh "Linked Data" yields a HTML page such as this:

Lessons learned

I was amazed by the fact how easy and quick it was to use the data from different sources to build a shell-based app. Most of the time I spent writing the scripts (hey, I’m not a shell guru and reading the sed manual is not exactly fun) and tuning the XSLT to output some nice HTML. The actual data integration part, that is, loading the data it into the store and querying it, was straight-forward (beside overcoming some inconsistencies in the data).

From the approximately eight hours I worked on the demo, some 70% went into the former (shell scripts and XSLT), some 20% into the latter (4store handling via curl and creating the SPARQL queries) and the remaining 10% were needed to create the shiny figures and the screen-cast, above. To conclude: the only thing you really need to create a useful LOD app is a good idea which sources to use, the rest is pretty straight-forward and, in fact, fun ;)


Filed under: Experiment, Linked Data

Syndicated 2010-09-27 10:32:51 from Web of Data

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!