Older blog entries for mhausenblas (starting at number 34)

Linked Data Consumption – where are we?

These are exciting times, isn’t it? Every day new activities around Linked Data are reported.

All this happens at a rate, which can be overwhelming. Hence I think one should from time to time step back and have a chilled look at where we are concerning consumption of Linked Data. In the following I try to sum up a (rather high-level) view on the current state of the art and highlight ongoing challenges:

Task Technology Examples
Discovery Follow-Your-Nose on RDF-triple-level, Sitemaps, voiD
Access OpenLink’s ODE, SQUIN, any23
Consolidation sameas.org, Sig.ma
Nurture uberblic

As you can see, the more we get away from the data (discovery, access) and move into the direction of information, the fewer available solutions are there. From an application perspective aiming at exploiting Linked Data, the integrated, cleaned, information is of value, not the raw, distributed and dirty (interlinked) RDF pieces out there. In my experience, most of the consolidation and nurturing is still done on the application-level in an ad-hoc manner. There is plenty of room for frameworks and infrastructure to supply this functionality.

No matter if you’re a start-up, a first-year student or a CIO in an established company – have a look at the challenges and remember: now is the right time to come up with solutions. You and the Web will benefit from it.


Filed under: FYI, Linked Data

Syndicated 2010-06-04 08:43:56 from Web of Data

Linked Data for Dummies

Every now and then I ask myself: how would you explain the Linked Data stuff I’m doing to our children or to my parents, FWIW. So, here is an attempt to explain the Linked Data Web, and I promise that I wont use any lingo in the following:

Imagine you’re in a huge building with several storeys, each with an incredible large amount of rooms. Each room has tons of things in it. It’s utterly dark in that building, all you can do is walk down a hallway till you bang into a door or a wall. All the rooms in the buildings are somehow connected but you don’t know how. Now, I tell you that in some rooms there is a treasure hidden and you’ve got one hour to find it.

Here comes the good news: you’re not left to your own resources. You have a jinn, let’s call him Goog, who will help you. Goog is able to take instantaneously you to any room once you tell him a magic word. Let’s imagine the treasure you’re after is a chocolate bar, and you tell Goog: “I want Twox”. Goog tells you now that there are 3579 rooms where there is something with “Twox” in there. So you start with the first room Goog suggests to you, and as a good jinn he of course takes you there immediately; you don’t need to walk there. Now you’re in the room you put everything you can grab into your rucksack and get back outside (remember, you can’t see anything, in there). Once you are outside the building again and can finally see what you’ve gathered you find out that what is in your rucksack is not really what you wanted. So, you have to get back into the building again and try the second room. Again, and again till you eventually find the Twox you want (and you are really hungry now, right?).

Now, imagine the same building but all the rooms and stairs are marked with fluorescent stripes in different colours, for example a hallway that leads you to some food is marked with a green stripe. Furthermore, the things in the rooms have also fluorescent markers in different shapes. For example, Twox chocolate bars are marked with green circles. And there is another jinn now as well- say hello to LinD. You ask LinD the same thing as Goog before: “I want Twox” and LinD asks you: do you mean Twox the chocolate bar or Twox the car? Well, the chocolate bar of course, you say and LinD tells you: I know about 23 rooms that contain Twox chocolate bars, I will get one for you in a moment.

How can LinD do this? Is LinD so much more clever than Goog?

Well, not really. LinD does not understand what a chocolate bar is, pretty much the same as Goog does not know. However, LinD knows how to use the fluorescent stripes and markers in the building, and can thus get you directly what you want.

You see. It’s the same building and the same things in there, but with a bit of a help in forms of markers we can find and gather things much quicker and with less disappointments involved.

In the Linked Data Web we mark the things and hallways in the building, enabling jinns such as LinD to help you to find and use your treasures. As quick and comfortable as possible and no matter where they are.


Filed under: FYI, Linked Data

Syndicated 2010-05-20 08:28:41 from Web of Data

On the usage of Linksets

Daniel Koller asked on Twitter an interesting question:

… are linksets today evaluated in an automated way?or does it depend on a person to interpret it?

Trying to answer this question here, but let’s step a bit: back in 2008, when I started to dive into ‘LOD metadata’ one of my main use cases was indeed how to automate the handling of LOD datasets. I wanted to have a formal description of a dataset’s characteristics in order to write a sort of middle ware (there it is again, this bad word) that could use the dataset metadata and take the burden away from a human to sift through the ‘natural language’ descriptions found in the Wiki pages, such as the Dataset page.

Where are we today?

Looking at the deployment of voiD, I guess we can say that there is a certain uptake; several publisher and systems support voiD and there are dedicated voiD stores available out there, such as the Talis voiD store and the RKB voiD store.

In our LDOW2009 paper Describing Linked Datasets we outlined a couple of potential use cases for voiD and gave some examples of actual usage already. Most notably, Linksets are used for ranking of datasets (see the DING! paper) and distributed query processing.

However, to date I’m not aware of any implementation of my above outlined idea of a middle ware that exploit Linksets. So, I guess one answer to Daniel’s question is: at the moment, mainly humans look at it and use it.

What can be done?

The key to voiD really is its abstraction level. We describe entire Datasets and their characteristics, not single resources such as a certain place, a book or a gene. Understanding that the links are the essence in a truly global-distributed information space, one can see that the Linksets are the key to automatically process the LOD datasets, as they bear the high-level metadata about the interlinking.

When you write an application today that consumes data from the LOD cloud, you need to manually code which datasets you are going to use. Now, imagine a piece of software that really operates on Linksets: suddenly, it would be possible to specify certain requirements and capabilities (such as: ‘needs to be linked with some geo data and with statistical data’) and dynamically plug-in matching dataset. Of course, towards realising this vision, there are other problems to overcome (for example concerning the supported vocabularies vs. SPARQL queries used in the application), however, at least to me, this is a very appealing area, worth investing more resources.

I hope this answers your question, Daniel, and I’m happy to keep you posted concerning the progress in this area.


Filed under: Linked Data, voiD

Syndicated 2010-05-19 08:20:09 from Web of Data

Oh – it is data on the Web

A little story about OData and Linked Data …

Others already gave some high-level overview about OData and Linked Data, but I was interested in two concrete questions: how to utilise OData in the Linked Data Web and how to turn Linked Data into OData.

As already mentioned, I consider Atom, which forms one core bit of OData, as hyperdata allowing to publish data in the Web, not only on the Web. And indeed, the first OData example I examined (http://odata.netflix.com/Catalog/People) looked quite promising:

<entry>
<id>http://odata.netflix.com/Catalog/People(196)</id>
<title type="text">George Abbott</title>
<updated>2010-04-13T12:02:01Z</updated>
<author>
<name />
</author>
<link rel="edit" title="Person" href="People(196)" />
<link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Awards" type="application/atom+xml;type=feed" title="Awards" href="People(196)/Awards" />
<link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/TitlesActedIn" type="application/atom+xml;type=feed" title="TitlesActedIn" href="People(196)/TitlesActedIn" />
<link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/TitlesDirected" type="application/atom+xml;type=feed" title="TitlesDirected" href="People(196)/TitlesDirected" />
<category term="NetflixModel.Person" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">196</d:Id>
<d:Name>George Abbott</d:Name>
</m:properties>
</content>
</entry>

Note, that there is a URI in the id element that can be used as entity URI and also link/@rel values that can be exploited as typed links. I ran it through OpenLink’s URI Burner (result) and hacked a little XSLT that picks out the relevant bits, just to see how an RDF version might look like. Though the @rel values do not dereference (try it out yourself: http://schemas.microsoft.com/ado/2007/08/dataservices/related/Awards) I thought, well, we can still handle it somehow as Linked Data.

Then, I looked at some more OData examples, just to find out that almost all of the other examples from the OData sources more or less look like the following (from http://datafeed.edmonton.ca/v1/coe/BusStops):

<entry m:etag="W/&quot;datetime'2010-01-14T22%3A43%3A35.7527659Z'&quot;">
<id>http://datafeed.edmonton.ca/v1/coe/BusStops(PartitionKey='1000',RowKey='3b57b81c-8a36-4eb7-ac7f-31163abf1737')</id>
<title type="text"></title>
<updated>2010-04-13T15:42:53Z</updated>
<author>
<name />
</author>
<link rel="edit" title="BusStops" href="BusStops(PartitionKey='1000',RowKey='3b57b81c-8a36-4eb7-ac7f-31163abf1737')" />
<category term="OGDI.coe.BusStopsItem" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:PartitionKey>1000</d:PartitionKey>
<d:RowKey>3b57b81c-8a36-4eb7-ac7f-31163abf1737</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2010-01-14T22:43:35.7527659Z</d:Timestamp>
<d:entityid m:type="Edm.Guid">b0d9924a-8875-42c4-9b1c-246e9f5c8e49</d:entityid>
<d:stop_number>1000</d:stop_number>
<d:street>Abbottsfield</d:street>
<d:avenue>Transit Centre</d:avenue>
<d:region>Edmonton</d:region>
<d:latitude m:type="Edm.Double">53.57196999</d:latitude>
<d:longitude m:type="Edm.Double">-113.3901687</d:longitude>
<d:elevation m:type="Edm.Double">0</d:elevation>
</m:properties>
</content>
</entry>

What you immediately see is the XML payload in the content element, making heavy use of two elements in the d: and m: namespace, two URIs that 404 and hence do not allow me to learn more about the schema (beside the fact that they are centrally maintained by Microsoft).

So, what does this all mean?

Imagine a Web (a Web of Documents, if you wish), which is not based on HTML and hyperlinks, but on MS Word documents. The documents are all available on the Internet, so you can download them and consume the content. But after you’re done with a certain document that talks about a book, how do you learn more about it? For example, reviews about the book or where you can purchase it? Maybe the original document mentions that there is some more related information on another server. So you’d need to go there and look for the related bit of information yourself. You see? That’s what the Web is great at – you just click on a hyperlink and it takes you to the document (or section) you’re interested in. All the legwork is taken care of for you through HTML, URIs and HTTP.

Hm, right, but how is this related to OData?

Well, OData feels a bit like the above mentioned scenario, just concerning data. Of course you – well actually rather a software program I guess – can consume it (a single source), but that’s it. To sum up my impression so far:

  • OData enables to publish structured data on the Web and theoretically in the Web (what’s the difference?)
  • OData uses Atom (and APP) as a framework with the actual data as (proprietary) XML payload;
  • OData typically creates data silos; discovering data beyond a single source is, nicely put, not easy;
  • Creating Linked Data from OData seems not a promising route;
  • Creating OData from Linked Data seems feasible and is desirable, in order to leverage tools such as Pivot.

Regarding the last bullet point, the ‘how to turn Linked Data into OData’, I will do some further research and keep you posted, here.


Filed under: FYI, Linked Data

Syndicated 2010-04-14 08:48:50 from Web of Data

Towards Web-based SPARQL query management and execution

Every now and then I use SPARQL queries to learn about a Linked Data source, to debug an RDF document such as a FOAF file or to demonstrate the usage of Web Data.

Quite often I write the SPARQL queries from the scratch, I have some examples stored in some project folders along with the source code, or I might look-up stuff in Lee’s wonderful SPARQL By Example slide set.

Another issue I have is that though there are a few public, generic SPARQL endpoints and a Wiki page with a list of SPARQL endpoints, I need to consult these manually in order to tell where (and how) I can execute a query.

With all the know-how we have in the Web of Data, with vocabularies and more, shouldn’t it be possible to offer a more developer-friendly environment to manage and execute SPARQL queries?

I guess so.

Before Easter, Richard and I discussed these issues, especially the problem that the endpoints slightly vary in terms of interfaces and execution. I’d like to share my findings with you, my dear reader: there are not that many solutions out there, yet. Leigh has worked on Twinkle, a Java-based desktop client wrapping ARQ that provides much of the functionality I’d need. Then, I gathered that Melvin has started to work on SPARQL.me, a Web-based solutions that allows to store and execute SPARQL queries, supporting FOAF+SSL for log-in, etc. – very close to what I was looking for, already, though I’m still missing certain features (esp. re the description of the SPARQL queries themselves, handling of the execution, etc.).

As I was not aware of Melvin’s work upfront (my bad, he did tell us about it earlier this year) I thought I give it a try myself. The result is called omniQ, it’s an experimental service that allows you to store and execute SPARQL queries in a collaborative fashion. The goal would be to compile a library of queries to enable people to utilise them for different purposes (as described above for my cases, I bet there are more out there). Further, omniQ exposes the SPARQL queries in RDFa (example), allowing for syndication and structured queries over queries. Fun, isn’t it ? ;)

I’d like to hear your thoughts. Are you aware of any other (Web-based) SPARQL query management and execution environments? What other features would you expect? What more could we describe concerning the queries itself?


Filed under: Idea, Linked Data

Syndicated 2010-04-09 15:56:48 from Web of Data

Web of Data Access Control Discovery via HTTP Link: header

Yesterday, TimBL talked about Distributed Social Networking Through Socially Aware Cloud Storage during the W3C Social Web XG meeting. I’m not going to discuss the huge potential strategic impact of this, but rather focus on a certain ‘technical’ detail that caught my attention. In his (related) design note Socially Aware Cloud Storage, he writes:

Access control files for a resource are discovered by a client using the HTTP link header.

Fair enough. So assuming we use the WebAccessControl vocabulary in an access control file (ACF) to restrict access to a resource on the Web. So, how exactly should the interaction take place? What should we use as a @rel-value for the HTTP Link: header? Does it make sense for the user agent (UA) to evaluate the ACF? Is the ACF discovery necessary at all?

Here is what I came up with so far:

Thoughts?

Ah, btw, once this is sorted I’ll update the WACup demo with it …


Filed under: Linked Data

Syndicated 2010-03-04 10:23:35 from Web of Data

Data and the Web – a great many of choices

Jan Algermissen recently compiled a very useful Classification of HTTP-based APIs. This, together with Mike Amundsen’s interesting review of Hypermedia Types made me think about data and the Web.

One important aspect of data is “on the Web vs. in the Web” as Rick Jelliffe already highlighted in 2002:

To explain my POV, let me make a distinction between a resource being “on” the Web or “in” the Web. If it is merely “on” the Web, it does not have any links pointing to it. If a resource is “in” the Web, it has links from other resources to it. [...] A service that has no means of discovery (i.e. a link) or advertising is “on” the Web but not “in” the Web, under those terms. It just happens to use a set of protocols but it
is not part of a web. So it should not be called a web service, just an unlinked-to resource.

In 2007 Tom Heath repeated this essential statement in the context of Linked Data.

So, I thought it makes sense to revisit some (more or less) well-known data formats and services and try to pin down what “in the Web” means – a first step to measure how well-integrated they are with the Web. I’ll call the degree of how “much” they are in the Web the Link factor in the following. I suggest that the Link factor ranges from -2 (totally “on the Web”) to +2 (totally “in the Web), with the following attempt of a definition for the scale:

-2 … proprietary, desktop-centric document formats
-1 … structured data that can be exposed and accessed via Web
 0 … standardised, Web-aligned (XML-based) formats or Web services
 1 … open, standardised (document) formats
 2 … full REST-compliant, open (data) standards natively supporting links

Here is what I’ve so far – feel free to ping me if you disagree or have some other suggestions:

Technology Examples Link factor
Documents MS Word, PDF -2
Spreadsheets MS Excel -1
RDBMS Oracle DB, MySQL -1
NoSQL BigTable, HBase, Amazon S3, etc. 0
Hypertext and Hypermedia HTML, VoiceML, SVG, Google Docs 1
Hyperdata Atom, OData, Linked Data 2

Filed under: FYI, Linked Data, Proposal

Syndicated 2010-03-01 13:05:12 from Web of Data

A case for Central Points of Access (CPoA) in decentralised systems


This post has been triggered by a Twitter thread, where I replied to @olyerickson that I think https://subj3ct.com is a good thing to have. Then, @hvdsomp noted (rightly!) that registries don’t scale (in reference to a conversation we had earlier on).

Big confusion, right? Michael says one thing and then the opposite on the very next day. Nah, no really ;)

Actually, turns out I’ve been quite consistent over time. In late 2008 I wrote in Talis’ NodMag #4 (on page 16):

Could you imagine reporting your new blog post, Wiki page or whatever you have to hand to an authority that takes care of adding it to a ‘central look-up repository’? I can’t, and there is at least one good reason for it: such things don’t scale. However, there are ways to announce and promote the content.

So, what is the difference between a UDDI-style registry (which, btw, not to exactly turned out to be a success) and, what I’ll call a central point of access (CPoA) in the following?

Before I try to answer the question, let me first give you some examples of CPoAs in the Web of Data context:

Some of these CPoAs employ automated techniques to fill their internal databank (such as Sindice or sameas.org), some of them depend on human input (for example prefix.cc). Some of them focus on a special kind of use case or domain (Cupboard or voiD stores), some try to be as generic as possible (Falcons, Sindice).

All of them, though, do share one principle: it’s up to you if you’re listed there or not (ok, technically, some might discover your data and index it, but that’s another story). The (subtle) difference is a-prior vs. a-posterior: no one forces you to submit, say your voiD file to a voiD store or to Sindice. However, if you want to increase your visibility, if you want people to find your valuable data, want them to use it, you’ll need to promote it. So, I conclude: one, effective way to promote your data (and schema, FWIW) is to ‘feed’ CPoA. Contrast this with a centralised registry where you need to submit your stuff first, otherwise no one is able to find it (or, put in other words: if you don’t register, you’re not allowed to participate).

There are exceptions I’m aware of: DNS, for example, which works, I think, mainly due to its hierarchical aspect. Other approaches can be pursued as well, for example P2P systems come to mind.

Nevertheless, I stand by it: centralised, forced-to-sign-up registries are bad for the Web (of Data). They do not scale. CPoA, such as listed above are not only good for the Web (of Data) but essential to make it usable; especially to allow to bridge the term-URI gap (or: enter the URI space), which I’ll flesh out in another post. Stay tuned!

Filed under: FYI, Linked Data, voiD

Syndicated 2010-02-18 11:08:51 from Web of Data

Do we have a Linked Data research agenda?


At WWW09 a bunch of leading Linked Data researchers came together and kicked-off the process for drafting a ‘Research Agenda For Linked Data’. Since then, a couple of things happened.

So, coming back to the title of this post: do we have a Linked Data research agenda? The answer is a clear: it depends ;)

Looking at the ‘Topics of Interest’ of this year’s Linked Data on the Web (LDOW2010) workshop at WWW2010, and contrasting it with the TOP10 list we produced a year ago, my impression is that (at least in the next couple of months) we should focus on the following topics:

  • Interlinking algorithms (beside entity-identity-focused frameworks such as Silk, there is not much there, anyway)
  • Provenance & Trust – I see potential outreach possibilities through W3C’s Provenance Incubator, however, lots of legwork to be done, still. Web of Trust? Anyone?
  • Dataset Dynamics (alternative/related keywords: change sets, logs, history, temporal tracking of datasets)

What do you see upcoming? What are important issues to be resolved in the Linked Data world (both from a research perspective and concerning open development tasks)?

Filed under: Linked Data, Proposal

Syndicated 2010-02-13 10:30:58 from Web of Data

Is Google a large-scale contributor to the LOD cloud?


Yesterday, Google announced that WebFinger has been enabled for all Gmail accounts with public profiles. So, for example, using my public profile at Google:

http://www.google.com/s2/webfinger/?q=Michael.Hausenblas@gmail.com

yields:


<XRD xmlns='http://docs.oasis-open.org/ns/xri/xrd-1.0'>
<Subject>acct:Michael.Hausenblas@gmail.com</Subject>
<Alias>http://www.google.com/profiles/Michael.Hausenblas</Alias>
<Link rel='http://portablecontacts.net/spec/1.0'
href='http://www-opensocial.googleusercontent.com/api/people/'/>
<Link rel='http://webfinger.net/rel/profile-page'
href='http://www.google.com/profiles/Michael.Hausenblas' type='text/html'/>
<Link rel='http://microformats.org/profile/hcard'
href='http://www.google.com/profiles/Michael.Hausenblas' type='text/html'/>
<Link rel='http://gmpg.org/xfn/11'
href='http://www.google.com/profiles/Michael.Hausenblas' type='text/html'/>
<Link rel='http://specs.openid.net/auth/2.0/provider'
href='http://www.google.com/profiles/Michael.Hausenblas'/>
<Link rel='describedby'
href='http://www.google.com/profiles/Michael.Hausenblas' type='text/html'/>
<Link rel='describedby'
href='http://s2.googleusercontent.com/webfinger/?q=Michael.Hausenblas%40gmail.com&fmt=foaf'
type='application/rdf+xml'/>

… which is already quite impressive. Above, you see XRD, the ‘eXtensible Resource Descriptor’ format used to state some essential information about the entity identified through ‘Michael.Hausenblas@gmail.com’.

But it gets even better: as DanBri pointed out on IRC, due to the great work of Brad Fitzpatrick et al, one can obtain FOAF from WebFinger:

http://s2.googleusercontent.com/webfinger/?q=Michael.Hausenblas%40gmail.com%26fmt%3Dfoaf

gives us …


<?xml version='1.0'?>
<rdf:RDF xmlns='http://xmlns.com/foaf/0.1/' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<PersonalProfileDocument rdf:about=''>
<maker rdf:nodeID='me'/>
<primaryTopic rdf:nodeID='me'/>
</PersonalProfileDocument>
<Person rdf:nodeID='me'>
<nick>Michael.Hausenblas</nick>
<name>Michael Hausenblas</name>
<holdsAccount>
<OnlineAccount rdf:about='acct:Michael.Hausenblas@gmail.com'>
<accountServiceHomepage rdf:resource='http://www.google.com/profiles/'/>
<accountName>Michael.Hausenblas</accountName>
</OnlineAccount>
</holdsAccount>
</Person>
</rdf:RDF>

I dunno how many public Google profiles there are, but I guess quite some … contributing to the Linked Open Data cloud from now on. There is still a lot we can optimise, for sure:

  • Enhance the FOAF available from WebFinger at Google
  • Make the XRD available in RDF; this is actually a work we’ve started a while ago with ULDis, the ‘Universal Link Discovery’ client. In ULDis we developed the ‘Abstract Resource Descriptor vocabulary’ (aardv) able to map between XRD, POWDER and voiD. We also started to work on a converter, the ‘Automated descRiptor Converter’, resulting in aardv.arc.
Filed under: Announcement, Idea, Linked Data

Syndicated 2010-02-12 10:18:42 from Web of Data

25 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!