8 Jun 2011 mhausenblas   » (Journeyer)

Towards Networked Data

This is the second post in the solving-tomorrow’s-problems-with-yesterday’s-tools series.

In his seminal article If You Have Too Much Data, then “Good Enough” Is Good Enough Pat calls for a ‘new theory for data’ – I’d like to call this: networked data (meaning: consuming and manipulating distributed data on a Web-scale).

In this post, now, I’m going to elaborate on the first of his points in the context of Linked Data:

We need a new theory and taxonomy of data that must include:

  • Identity and versions. Unlocked data comes with identity and optional versions.

If you take a 10,000 feet view on the Linked Data principles it reads essentially as follows (the stuff in bold is what I added, here):

  1. Use URIs as names for things – entity identity
  2. Use HTTP URIs so that people can look up those names – entity access
  3. When someone looks up a URI, provide useful information, using the standards – entity structure
  4. Include links to other URIs. so that they can discover more things – entity integration

One word of caution before we dive into it: Linked Data, as we talk is pretty well-defined for the read-only case (the write-enabled case is still subject to research and standardisation).

If you compare the Linked Data principles from above with what Pat demands from the ‘new theory for data’, I think it is fair to state that the entity identity part as well as the entity access part is well covered. The versioning part might be a bit tricky, but doable – for example with Named Graphs, quads, etc.

Concerning the entity structure it occurs to me that there are two schools of thought: ‘purists’ who demand that only RDF serialisations are allowed for representing an entity’s structure on the one hand and the more liberal interpretation which includes technologies such as OData and only recently (triggered through the introduction of Schema.org) also Microdata, on the other hand. Time will tell uptake and success of any of the mentioned technologies, but in doubt I prefer to be inclusive rather than exclusive concerning this question.

The entity integration part is not explicitly mentioned by Pat – I wonder why? ;)

Filed under: FYI, Linked Data, NoSQL

Syndicated 2011-06-08 08:03:36 from Web of Data

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!