Older blog entries for mhausenblas (starting at number 20)

Announcing Application Metadata on the Web of Data


I’m just about to release a new version of the voiD editor, called ve2. It’s currently located at http://ld2sd.deri.org/ve2/ (note that this is a temporary location; I gotta find some time setting up our new LiDRC lab environment).

Anyway, the point is really: every now and then one deploys a Web application (such as ve2; see, that’s why I needed the pitch) and likely wants to also tell the world out there a bit about the application. Some things you want to share with the Web at large that come immediately to mind are:

  • who created the app and who maintains it (creator, legal entity, etc.)
  • which software it has been created with (Java, PHP, jQuery, etc.)
  • where the source code of the app is
  • on which other services it depends on (such as Google calendar, flickr API, DBpedia lookup, etc.)
  • acknowledgments
  • usage conditions

Now, for most of the stuff one can of course use DOAP, the Description of a Project vocabulary, as we did (using RDFa) in the riese project, but some of the metadata goes beyond this, in my experience.

To save myself time (and hopefully you as well) I thought it might not hurt to put together an RDFa template for precisely this job: Announcing Application Metadata on the Web of Data . So, I put my initial proposal, based on Dublin Core and DOAP, at:

http://lab.linkeddata.deri.ie/2010/res/web-app-metadata-template.html

Creative Commons License

Note: The WebApp metadata template is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. You may want to include a reference to this blog post.

Posted in Idea, Linked Data, Proposal

Syndicated 2010-01-06 08:28:47 from Web of Data

Linked Data – the past 10y and the next 10y


Though Linked Data (the set of principles) can be considered being around since roughly three years, the technologies it builds upon are around already considerable longer: two of the three core Linked Data technologies (URIs and HTTP) are some 20y old. And because I know that you’re at least as curious as I am ;) I thought it might be nice to sit down and capture a more complete picture:
Thermo-view on Linked Data technologies (end of 2009)
So, why a thermo-view? Well, actually using technologies is a bit like ice-skating, isn’t it? As long as a technology is still evolving, it is sort of fluid (like water). Then there are crystallisation point(s), the technology matures and can be used (a thin layer of ice). After a while, the technology is established and robust – able to carry heavy load (a thick layer of ice).

Lesson learned: it takes time and the right environmental conditions for a technology to mature. Can you take this into account, please, the next time you’re tempted to ask: “when will the Semantic Web arrive?” :D

So much for the past 10 years.

What’s upcoming, you might wonder? Well we hear what the “Web 3.0 leaders” say and here is what I think will happen:

  • In 2010 we will continue to witness how Linked Data is successfully applied in the Governmental domain (in the UK, in the US, for transparency etc.) and in the Enterprise area (eCommerce: GoodRelations, IBM, etc.).
  • In 2011, Linked Data tools and libraries will be ubiquitous. A developer will use Linked Open Data (LOD) in her application just as she would do with her local RDBMS (actually, there are libraries already emerging that allow you to do this).
  • In 2012 there will be thousands of LOD datasets available. Issues around provenance and dataset dynamics have been resolved.
  • In 2013, Linked Data-based solutions have displaced heavy-weight and costly SOA solutions in the Enterprises.
  • From 2014 on, Linked Data is taught in elementary schools. Game Over.

Ok, admittedly, the last bullet point is likely to be taken with a grain of salt ;)

However, I’d love to hear what you think. What are your predictions – factual or fiction, both welcome – for Linked Data? Where do you see the biggest potential for Linked Data and its applications in the near and not-so-near-future?

Syndicated 2009-12-29 10:31:27 from Web of Data

HATEOS revisited – RDFa to the rescue?


One of the often overlooked, IMO yet important features of RESTful applications is “hypermedia as the engine of application state” (or HATEOS as RESTafarians prefer it ;) – Roy commented on this issue a while ago:

When representations are provided in hypertext form with typed relations (using microformats of HTML, RDF in N3 or XML, or even SVG), then automated agents can traverse these applications almost as well as any human. There are plenty of examples in the linked data communities. More important to me is that the same design reflects good human-Web design, and thus we can design the protocols to support both machine and human-driven applications by following the same architectural style.

As far as I can tell, most people get the stuff (more or less) right concerning nouns (resources, URIs) and verbs (HTTP methods such as GET, POST, etc.) but neglect the HATEOS part. I’m not sure why this is so, but for a start let’s have a look at available formats:

  • Most obviously one can use HTML with its official link types or with microformats (for historic reasons see also a proposal for a wider spectrum of link types and for ongoing discussions you might want to keep an eye on the @rel attribute discussion).
  • Many people use Atom (concerning RDF, see also the interesting discussion via Ed Summer’s blog)
  • There are a few non-standard, in-house solutions (for example the one discussed in an InfoQ article)

Summing up, one could understand that there is a need for a standard format that allows to represent typed links in an extensible way and is able to serve humans and machines. In 2008 I argued that RDFa is very well suited for Linked Data and now I’m about to extend this slightly: one very good way to realise HATEOS is indeed RDFa.

Happy to hear your thoughts about this (admittedly bold) statement!

Syndicated 2009-12-15 10:53:43 from Web of Data

LDC09 dataset dynamics demo – screencast


Update: for the dataset dynamics demo developed during the Linked Data Camp Vienna there is now also a screen-cast available (video, slides in PDF):

Syndicated 2009-12-04 11:24:43 from Web of Data

Linked Data Camp Vienna hacking wrap-up


Jürgen Umbrich and I virtually participated in the LDC09 session regarding datasets dynamics.

Over the past couple of days, we hacked a little demo on a distributed change notification system for Linked Open Data, based on voiD+dady and (a slightly modified version ) of an Atom feed. Here is the overall setup:

In case you want to play around with it yourself, you can check out the source code as well. Feedback and feature requests welcome ;)

Syndicated 2009-12-02 14:18:10 from Web of Data

Linked Open Data Caching – Update


I recently wrote about caching support in the Linked Open Data here and got nice feedback (and questions ;) from dataset publisher. In a follow-up mail exchange, Mark Nottingham was to kind to provide me with two very valuable resources I’d like to share with you:

  • The Resource Expert Droid (redbot), http://redbot.org/, a tool that ‘checks HTTP resources to see how they’ll behave, pointing out common problems and suggesting improvements’, especially useful if you want to debug Linked Data sets on the HTTP level.
  • Concerning the question how my findings relate to the Web at a large, Mark pointed out a community project, called Web Authoring Statistics, which performed an analysis on quite a few Web documents, yielding results about various document-related but also HTTP-related stuff.

Please let me know if you are aware of more resources in this area (studies, etc.) and I’ll post it here!

Syndicated 2009-12-02 14:00:45 from Web of Data

Keepin’ Up With A LOD Of Changes


So, the other day I had a look at caching support in the Linked Open Data cloud and it turns out that there is a related discussion regarding caching on the ietf-http-wg@w3.org mailing list.

Then, there is another related update from Bill Roberts: Delivering Linked Data quickly with which I wholeheartedly agree.

To take the entire stuff a step further I tried to outline the overall problem in a short slide deck (best viewed full-screen ;)

My hunch is that 80% of the stuff is already out there available (such as Atom, Changeset vocabulary, voiD, etc.) and only minor pieces are missing. Next step would be to hammer out a simple demo and gather some more experiences with it. In case you are interested to chime in let me know :)

Syndicated 2009-11-26 14:02:27 from Web of Data

Linked Open Data Caching – Establishing a Baseline with HTTP


The other day I was pondering on Linked Open Data Source Dynamics and as a starting point I wanted to learn more about the caching characteristics of LOD data sources. Now, in order to establish a baseline, one should have a look at what HTTP, one of the pillars of Linked Data, offers (see also RFC2616, Caching in HTTP).

So, I hacked a little PHP script that takes 17 sample resources from the LOD cloud (from representative datasets ranging from DBpedia over GeoSpecies to W3C Wordnet). The results of the LOD caching evaluation are somewhat deflating: more than half of the samples do not support cache control and less than 20% support Last-Modified or ETag headers.

LOD datasets sending Last-Modified header

LOD datasets sending ETag header

LOD datasets sending Cache-Control header

I know, I know, this is just a very limited experiment. And yes, very likely there are not yet that many applications out there consuming Linked Data and hence using up the whole bandwidth. However, given that one of the arguments for the scalability on the Web is the built-in HTTP caching mechanism, LOD dataset publisher might want to consider having a closer look into what the server or platform at hand is able to offer concerning caching support.

Syndicated 2009-11-23 16:28:43 from Web of Data

Incentives for publishing and consuming linked data


Having read Adam Jacobs’ The Pathologies of Big Data and Stefano Mazzocchi’s Data Smoke and Mirrors I found myself asking: what is the motivation for people to publish linked data, and in turn to consume it (sounds funny you think? well, just because the data is available doesn’t necessarily mean it is useful or actually used ;)

Ok, so let’s start with a nice statement from Adam’s ACM article:

Here’s the big truth about big data in traditional databases: it’s easier to get the data in than out.

Yup, I think I agree and I guess the same is true for Linked Data. There are tons of ‘cheap’ ways to publish in RDF (for example, regarding relational databases, we’re currently try to define a standard). However, there is still a need for high quality data and high quality links between the data items in order to allow the data to be used sensibly in applications!

Right, so my hunch is that for data providers there are a couple of reasons to publish their data in an open and easily accessible way, but I guess one main reason may be that due to providing the raw data, one can simply cut costs. Rather than writing a Web application that serves humans and offering an additional Web service/API (such as flickr or delicious did) , one can expose the original data directly via Linked Data and open up the possibility for others to develop cool applications on top of it (see also our recent work in this direction).

On the other hand, data consumers benefit from a single (RESTful) API with a uniform data model (RDF, in case it isn’t that obvious ;), which in turn enables simplified development of applications and allows the reuse of data (just like the BBC doesn’t have to maintain the artist and song data themselves anymore, but reuses MusicBrainz data).

Let me know – what is your incentive to publish/consume Linked Data?

Syndicated 2009-11-10 13:58:30 from Web of Data

Linked Data for RESTafarians


So, you took the red pill? You’re a full blown RESTafarian brother? Good news for you, then. You’ll understand the linked data in less then 30sec. Ok. Step by step. REST, understood as a ’set of constraints that inform an architecture’:

  1. Resource Identification
  2. Uniform Interface
  3. Self-Describing Messages
  4. Hypermedia Driving Application State
  5. Stateless Interactions

… and now read the linked data principles with your ‘REST goggles’ on:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

In the linked data, we use HTTP URIs for everything. For documents, but also for concepts or real-world entities such as people. Linked data provides a uniform (read-only) interface through HTTP GET. The messages are self-describing through RDF and RDF-based vocabularies and through the last of the linked data principles, what we have in the LOD cloud is a highly connected (or: interlinked) system.

As nicely described by Leonard Richardson and Sam Ruby in RESTful Web Services you design a RESTful (ROA) system in that you:

  • Identify and name your resources (using HTTP URIs),
  • design your representations (documents & data), and
  • link the resources to each other.

You’ll typically end up in a 3D design space such as the following (kudos to Cesare Pautasso and Erik Wilde):
REST-design-space

The same actually happens when you publish linked data, with some simplifications: due to the read-only characteristic of linked data you only have to worry about one HTTP verb (GET) and with RDF as the unified data model (based on your preferences and needs) you pick one of the RDF serializations (preferably RDFa, as it nicely integrates with HTML and hence allows you to serve humans and programs). When you have your data in RDF (or so ;) you’ll mainly find yourself worrying how to interlink it with other data on the Web. But this really is a huge benefit – finally enabling to use the Web as one huge database.

As an aside: I’m aware of the fact that we still need to sort out some issues along the way, both in the academia and in practice. However, I encourage people in both camps (RESTful yadayada and Linked Data rogues) to look beyond one’s own nose and eventually understand that there is only one Web and we all ‘live’ in it ;)

Syndicated 2009-10-09 11:14:50 from Web of Data

11 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!