25 May 2011 mhausenblas   » (Journeyer)

Why we link …

The incentives to put structured data on the Web seem to slowly seep in, but why does it make sense to link your data to other data? Why to invest time and resources to offer 5 star data? Even though the interlinking itself becomes more of a commodity these days – for example, the 24/7 platform we’re deploying in LATC is an interlinking cloud offering – the motivation for dataset publisher to set links to other datasets is, in my experience, not obvious.

I think it’s important to have a closer look at the motivation for interlinking data on the Web from a data integration perspective. Traditionally, you would download data from, say, Infochimps or you find it via CKAN or via the many other places that either directly offer data or provide a data catalog. Then you would put it in your favorite (NoSQL) database and use it in your application. Simple, isn’t it?

Let’s say you’re using a dataset about companies such as the Central Contractor Registration (CCR) . These companies typically have a physical address (or: location) attached:

Now, imagine I ask you to render the location of a selection of companies on a map. This requires you to look up the geographical coordinates of a company in a service such as Geonames:

I bet you can automate this, right? Maybe a bit of manual work involved, but not too much, I guess. So, all is fine, right?

Not really.

The next developer that comes along and wants to use the company data and nicely map it has to go through the exact same process. Figure what geo service to use, write some look-up/glue code, import the data and so on.

Wouldn’t it make more sense, from a re-usability point of view, if the original dataset provider (CCR in our example) would have a look at its data and identify what entities (such as companies) are there and provide the links to other datasets (such as location data) up-front? This is, in a nutshell, what Tim says concerning the 5th star of Open Data deployment:

Link your data to other people’s data to provide context.

To sum up: if you have data, think about providing this context – link it to other data in the Web and you make your data more useful and more usable and, in the long run, more used.

PS: the working title of this blog post was ‘As we may link’, to render homage to Vannevar Bush, but then I thought that might be a bit too cheesy ;)


Filed under: FYI, Linked Data

Syndicated 2011-05-22 20:37:30 from Web of Data

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!