Older blog entries for mhausenblas (starting at number 30)

Towards Web-based SPARQL query management and execution

Every now and then I use SPARQL queries to learn about a Linked Data source, to debug an RDF document such as a FOAF file or to demonstrate the usage of Web Data.

Quite often I write the SPARQL queries from the scratch, I have some examples stored in some project folders along with the source code, or I might look-up stuff in Lee’s wonderful SPARQL By Example slide set.

Another issue I have is that though there are a few public, generic SPARQL endpoints and a Wiki page with a list of SPARQL endpoints, I need to consult these manually in order to tell where (and how) I can execute a query.

With all the know-how we have in the Web of Data, with vocabularies and more, shouldn’t it be possible to offer a more developer-friendly environment to manage and execute SPARQL queries?

I guess so.

Before Easter, Richard and I discussed these issues, especially the problem that the endpoints slightly vary in terms of interfaces and execution. I’d like to share my findings with you, my dear reader: there are not that many solutions out there, yet. Leigh has worked on Twinkle, a Java-based desktop client wrapping ARQ that provides much of the functionality I’d need. Then, I gathered that Melvin has started to work on SPARQL.me, a Web-based solutions that allows to store and execute SPARQL queries, supporting FOAF+SSL for log-in, etc. – very close to what I was looking for, already, though I’m still missing certain features (esp. re the description of the SPARQL queries themselves, handling of the execution, etc.).

As I was not aware of Melvin’s work upfront (my bad, he did tell us about it earlier this year) I thought I give it a try myself. The result is called omniQ, it’s an experimental service that allows you to store and execute SPARQL queries in a collaborative fashion. The goal would be to compile a library of queries to enable people to utilise them for different purposes (as described above for my cases, I bet there are more out there). Further, omniQ exposes the SPARQL queries in RDFa (example), allowing for syndication and structured queries over queries. Fun, isn’t it ? ;)

I’d like to hear your thoughts. Are you aware of any other (Web-based) SPARQL query management and execution environments? What other features would you expect? What more could we describe concerning the queries itself?


Filed under: Idea, Linked Data

Syndicated 2010-04-09 15:56:48 from Web of Data

Web of Data Access Control Discovery via HTTP Link: header

Yesterday, TimBL talked about Distributed Social Networking Through Socially Aware Cloud Storage during the W3C Social Web XG meeting. I’m not going to discuss the huge potential strategic impact of this, but rather focus on a certain ‘technical’ detail that caught my attention. In his (related) design note Socially Aware Cloud Storage, he writes:

Access control files for a resource are discovered by a client using the HTTP link header.

Fair enough. So assuming we use the WebAccessControl vocabulary in an access control file (ACF) to restrict access to a resource on the Web. So, how exactly should the interaction take place? What should we use as a @rel-value for the HTTP Link: header? Does it make sense for the user agent (UA) to evaluate the ACF? Is the ACF discovery necessary at all?

Here is what I came up with so far:

Thoughts?

Ah, btw, once this is sorted I’ll update the WACup demo with it …


Filed under: Linked Data

Syndicated 2010-03-04 10:23:35 from Web of Data

Data and the Web – a great many of choices

Jan Algermissen recently compiled a very useful Classification of HTTP-based APIs. This, together with Mike Amundsen’s interesting review of Hypermedia Types made me think about data and the Web.

One important aspect of data is “on the Web vs. in the Web” as Rick Jelliffe already highlighted in 2002:

To explain my POV, let me make a distinction between a resource being “on” the Web or “in” the Web. If it is merely “on” the Web, it does not have any links pointing to it. If a resource is “in” the Web, it has links from other resources to it. [...] A service that has no means of discovery (i.e. a link) or advertising is “on” the Web but not “in” the Web, under those terms. It just happens to use a set of protocols but it
is not part of a web. So it should not be called a web service, just an unlinked-to resource.

In 2007 Tom Heath repeated this essential statement in the context of Linked Data.

So, I thought it makes sense to revisit some (more or less) well-known data formats and services and try to pin down what “in the Web” means – a first step to measure how well-integrated they are with the Web. I’ll call the degree of how “much” they are in the Web the Link factor in the following. I suggest that the Link factor ranges from -2 (totally “on the Web”) to +2 (totally “in the Web), with the following attempt of a definition for the scale:

-2 … proprietary, desktop-centric document formats
-1 … structured data that can be exposed and accessed via Web
 0 … standardised, Web-aligned (XML-based) formats or Web services
 1 … open, standardised (document) formats
 2 … full REST-compliant, open (data) standards natively supporting links

Here is what I’ve so far – feel free to ping me if you disagree or have some other suggestions:

Technology Examples Link factor
Documents MS Word, PDF -2
Spreadsheets MS Excel -1
RDBMS Oracle DB, MySQL -1
NoSQL BigTable, HBase, Amazon S3, etc. 0
Hypertext and Hypermedia HTML, VoiceML, SVG, Google Docs 1
Hyperdata Atom, OData, Linked Data 2

Filed under: FYI, Linked Data, Proposal

Syndicated 2010-03-01 13:05:12 from Web of Data

A case for Central Points of Access (CPoA) in decentralised systems


This post has been triggered by a Twitter thread, where I replied to @olyerickson that I think https://subj3ct.com is a good thing to have. Then, @hvdsomp noted (rightly!) that registries don’t scale (in reference to a conversation we had earlier on).

Big confusion, right? Michael says one thing and then the opposite on the very next day. Nah, no really ;)

Actually, turns out I’ve been quite consistent over time. In late 2008 I wrote in Talis’ NodMag #4 (on page 16):

Could you imagine reporting your new blog post, Wiki page or whatever you have to hand to an authority that takes care of adding it to a ‘central look-up repository’? I can’t, and there is at least one good reason for it: such things don’t scale. However, there are ways to announce and promote the content.

So, what is the difference between a UDDI-style registry (which, btw, not to exactly turned out to be a success) and, what I’ll call a central point of access (CPoA) in the following?

Before I try to answer the question, let me first give you some examples of CPoAs in the Web of Data context:

Some of these CPoAs employ automated techniques to fill their internal databank (such as Sindice or sameas.org), some of them depend on human input (for example prefix.cc). Some of them focus on a special kind of use case or domain (Cupboard or voiD stores), some try to be as generic as possible (Falcons, Sindice).

All of them, though, do share one principle: it’s up to you if you’re listed there or not (ok, technically, some might discover your data and index it, but that’s another story). The (subtle) difference is a-prior vs. a-posterior: no one forces you to submit, say your voiD file to a voiD store or to Sindice. However, if you want to increase your visibility, if you want people to find your valuable data, want them to use it, you’ll need to promote it. So, I conclude: one, effective way to promote your data (and schema, FWIW) is to ‘feed’ CPoA. Contrast this with a centralised registry where you need to submit your stuff first, otherwise no one is able to find it (or, put in other words: if you don’t register, you’re not allowed to participate).

There are exceptions I’m aware of: DNS, for example, which works, I think, mainly due to its hierarchical aspect. Other approaches can be pursued as well, for example P2P systems come to mind.

Nevertheless, I stand by it: centralised, forced-to-sign-up registries are bad for the Web (of Data). They do not scale. CPoA, such as listed above are not only good for the Web (of Data) but essential to make it usable; especially to allow to bridge the term-URI gap (or: enter the URI space), which I’ll flesh out in another post. Stay tuned!

Filed under: FYI, Linked Data, voiD

Syndicated 2010-02-18 11:08:51 from Web of Data

Do we have a Linked Data research agenda?


At WWW09 a bunch of leading Linked Data researchers came together and kicked-off the process for drafting a ‘Research Agenda For Linked Data’. Since then, a couple of things happened.

So, coming back to the title of this post: do we have a Linked Data research agenda? The answer is a clear: it depends ;)

Looking at the ‘Topics of Interest’ of this year’s Linked Data on the Web (LDOW2010) workshop at WWW2010, and contrasting it with the TOP10 list we produced a year ago, my impression is that (at least in the next couple of months) we should focus on the following topics:

  • Interlinking algorithms (beside entity-identity-focused frameworks such as Silk, there is not much there, anyway)
  • Provenance & Trust – I see potential outreach possibilities through W3C’s Provenance Incubator, however, lots of legwork to be done, still. Web of Trust? Anyone?
  • Dataset Dynamics (alternative/related keywords: change sets, logs, history, temporal tracking of datasets)

What do you see upcoming? What are important issues to be resolved in the Linked Data world (both from a research perspective and concerning open development tasks)?

Filed under: Linked Data, Proposal

Syndicated 2010-02-13 10:30:58 from Web of Data

Is Google a large-scale contributor to the LOD cloud?


Yesterday, Google announced that WebFinger has been enabled for all Gmail accounts with public profiles. So, for example, using my public profile at Google:

http://www.google.com/s2/webfinger/?q=Michael.Hausenblas@gmail.com

yields:


<XRD xmlns='http://docs.oasis-open.org/ns/xri/xrd-1.0'>
<Subject>acct:Michael.Hausenblas@gmail.com</Subject>
<Alias>http://www.google.com/profiles/Michael.Hausenblas</Alias>
<Link rel='http://portablecontacts.net/spec/1.0'
href='http://www-opensocial.googleusercontent.com/api/people/'/>
<Link rel='http://webfinger.net/rel/profile-page'
href='http://www.google.com/profiles/Michael.Hausenblas' type='text/html'/>
<Link rel='http://microformats.org/profile/hcard'
href='http://www.google.com/profiles/Michael.Hausenblas' type='text/html'/>
<Link rel='http://gmpg.org/xfn/11'
href='http://www.google.com/profiles/Michael.Hausenblas' type='text/html'/>
<Link rel='http://specs.openid.net/auth/2.0/provider'
href='http://www.google.com/profiles/Michael.Hausenblas'/>
<Link rel='describedby'
href='http://www.google.com/profiles/Michael.Hausenblas' type='text/html'/>
<Link rel='describedby'
href='http://s2.googleusercontent.com/webfinger/?q=Michael.Hausenblas%40gmail.com&fmt=foaf'
type='application/rdf+xml'/>

… which is already quite impressive. Above, you see XRD, the ‘eXtensible Resource Descriptor’ format used to state some essential information about the entity identified through ‘Michael.Hausenblas@gmail.com’.

But it gets even better: as DanBri pointed out on IRC, due to the great work of Brad Fitzpatrick et al, one can obtain FOAF from WebFinger:

http://s2.googleusercontent.com/webfinger/?q=Michael.Hausenblas%40gmail.com%26fmt%3Dfoaf

gives us …


<?xml version='1.0'?>
<rdf:RDF xmlns='http://xmlns.com/foaf/0.1/' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<PersonalProfileDocument rdf:about=''>
<maker rdf:nodeID='me'/>
<primaryTopic rdf:nodeID='me'/>
</PersonalProfileDocument>
<Person rdf:nodeID='me'>
<nick>Michael.Hausenblas</nick>
<name>Michael Hausenblas</name>
<holdsAccount>
<OnlineAccount rdf:about='acct:Michael.Hausenblas@gmail.com'>
<accountServiceHomepage rdf:resource='http://www.google.com/profiles/'/>
<accountName>Michael.Hausenblas</accountName>
</OnlineAccount>
</holdsAccount>
</Person>
</rdf:RDF>

I dunno how many public Google profiles there are, but I guess quite some … contributing to the Linked Open Data cloud from now on. There is still a lot we can optimise, for sure:

  • Enhance the FOAF available from WebFinger at Google
  • Make the XRD available in RDF; this is actually a work we’ve started a while ago with ULDis, the ‘Universal Link Discovery’ client. In ULDis we developed the ‘Abstract Resource Descriptor vocabulary’ (aardv) able to map between XRD, POWDER and voiD. We also started to work on a converter, the ‘Automated descRiptor Converter’, resulting in aardv.arc.
Filed under: Announcement, Idea, Linked Data

Syndicated 2010-02-12 10:18:42 from Web of Data

Some random notes on hypermedia and Linked Data


I stumbled over a tweet from Mike Amundsen where he essentially asked people to name some more “widely-used hypermedia-types” beside (X)HTML and Atom. Turns our Mike collected the findings and made it available at http://amundsen.com/hypermedia/. Cool. Thanks!

Couple of days later I read Linking data in XML and HATEOAS where Wilhelm contemplates about Linked Data etc. The last sentence of his post reads:

Anyone know why XLink was abandoned, or why linked data doesn’t follow this concept?

My hunch is that XLink didn’t have the expected uptake and hence failed to serve as a basis for a light-weight and simple way to link data on the Web.

As I’ve argued in a previous post, typed links are essential for true HATEOS, however I wonder if we’ve only scratched the surface of this …

Filed under: FYI, Linked Data

Syndicated 2010-02-10 18:38:32 from Web of Data

Supplier’s responsibility for defining equivalency on the Web of Data


Less than a year ago I asked W3C’s Technical Architecture Group (TAG) essentially if

… the [image] representation derived via [content negotiation from a generic resource] is equivalent to the RDF [served from it]

I asked for a “a note, a specification, etc. that normatively defines what equivalency really is”.

So, after some back and forth between the TAG and the IETF HTTPbis Working Group I happened to receive an answer. Thanks to all involved, I guess it was worth waiting. Seems like the upcoming HTTPbis standard will address this issue, essentially stating that

… in all cases, the supplier of representations has the responsibility for determining which representations might be considered to be the “same information”.

As an aside: I guess I’ll have to be patient again – this time I asked the above mentioned HTTPbis WG why the caching heuristics exclude the 303 header (see the current draft of HTTP/1.1, part 6: Caching, section 2.3.1.1.). But it’s not even two weeks into the question, so I don’t recon I’ll get mail from the chaps before 01/2011 ;)

Filed under: FYI, IETF, Linked Data, W3C

Syndicated 2010-02-02 08:28:04 from Web of Data

Using RDFa to publish linked data


Yesterday we had our first DERI-internal RDFa hands-on workshop. More than 20 colleagues attended, equipped with their laptop and a RDFa cheat sheet we provided. The goal was to support people in manually marking up their Web pages with RDFa, contributing to the growing Web of Data.

We plan to hold this workshop every two weeks, so in case you’re around, come and join us!

Posted in Linked Data

Syndicated 2010-01-26 09:01:21 from Web of Data

Moving from document-centric to result-centric


Our eldest one is turning seven soon and for him it is hard to imagine the pre-Web area. Sometimes he asks me but how did you do this or that without the Web? and quite often I must admit I don’t know the answer. Maybe some of the things we do nowadays were simply non-actions some 20y ago, like updating Twitter ;)

Anyway, let’s remind ourselves that the essential idea of the Web was doing ‘Hypertext over the Internet’, and TimBL was not the only one who had this idea. However, as far I can tell he was the only one who was successful on a large scale with sustainable and tangible outcome.

One thing that bothers me is that we are mentally still subscribed to the document-centric point-of-view. And, as a result, an application-centric point of view. What do I mean by that? Well, imagine a piece of paper and a pen. I can virtually do any kind of illustration and notes on it. I don’t need to get another pen to create a table; I don’t need a second sort of paper to draw a picture, etc.

And yet, we’re still used to think along this line. If you don’t believe me: even the latest, coolest Web application suites, such as GDocs essentially forces you to decide up-front, which kind of document you wanna create. Shouldn’t we have overcome this?

The good news is: we’re now able to overcome the document-centric POV, due to what Linked Data enables. I won’t focus on the technical details or their evolution for now but on what I call result-centric. This essentially means that one is interested in the result of an action rather than by the means it has been achieved. A little analogy might help: say, you want to travel from Galway to Madrid and the only requirement is that it has to be as cheap as possible (hey, I’m a researcher – time doesn’t matter, but budget constraints). So, what counts at the end of the day is that (i) you arrive in Madrid and (ii) you’ve spent as little money as possible. This might mean you have to switch from plane to bus to train, maybe, but anyway, the result matters to you, not which kind of transport medium you’ve used. Same with certain, if not all kinds of tasks on the computer. Frankly, I don’t give a damn if I have to use this or that application. I might just need to write a report, including figures and tables and the more efficient I can do this, the better. Today, this likely means I’ve got to use some two or three applications (which I have to know, to pay for, etc. – yey, TCO do matter).

Coming back to Linked Data, which essentially enables ubiquitous and seamless data integration, one can imagine a new class of application: general purpose viewing and editing – a truly result-centric way of working with the computer. In fact, the first generation of the ‘read-only’ case, Linked Data browser, such as DIG’s Tabulator, OpenLink’s Data Explorer or Sigma are available already.

What we now need, I think, is DDE/OLE done right. On the Web. Based on Linked Data. Addressing security, trust, privacy and billing issues. Allowing us to move forward. From document-centric to result-centric.

As an aside: this post was influenced by a book I’m currently reading.

Posted in Linked Data

Syndicated 2010-01-18 10:36:21 from Web of Data

21 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!