26 Oct 2010 mhausenblas   » (Journeyer)

Quick RDFa profiling report

The other day at semantic-web@w3.org, William Waites asked how to link to an RDF serialisation from an HTML document (if I understood him correctly) as he seems not too much into RDFa. While the answer to this one is IMHO rather straight-forward (use <link href="yadayada.rdf" rel="alternate" type="application/rdf+xml" /> the follow-up discussion reminded me on the section in our Linked Data with RDFa tutorial on Usability Issues (note that this document will soon be moved to another location):

One practical issue you may want to check against sometimes occurs with fine-grained, high-volume datasets. Imagine a detailed description of audio-visual content (say, a one hour video incl. audio track) in RDF, or, equally, a detailed RDF representation of a multi-dimensional table of statistics, with dozens of columns and potentially thousands of rows. In both cases, one ends up with potentially many triples, which might mean some 100k triple or more. As both humans and machines are expected to consume the RDFa document, one certainly has to find a trade-off between using RDFa for the entire description (meaning to embed all triples in the HTML document) and an entirely externalised solution, for example using RDF/XML:

We also give a rough guideline how to decide how much is too much:

… having the entire RDF graph embedded certainly is desirable, however, one has to check the usability of the site. Usability expert Jakob Nielsen advocates a size limit for Web pages yielding an approximate 10 sec response time limit. Based on this we propose to perform a simple sort of response time testing, once with the plain HTML page and once with the embedded RDF graph. In case of a significant difference, one should contemplate if the all-in-RDFa approach is appropriate for the use case at hand.

Now, I wanted to get some real figures regarding how the number of triples embedded with RDFa impacts the loading time of an HTML page in a browser and did the following: I loaded some 17 cities from DBpedia (such as Amsterdam) into an RDF store and created a number of generic RDFa+HTML documents essentially with:

SELECT * WHERE { ?s ?p ?o } LIMIT $SIZE

… where $SIZE would range from 10 to 20,000 – each triple looks essentially like:

<div about='http://dbpedia.org/resource/William_Howitt'>
<a rel='dbp:deathPlace'
  href='http://dbpedia.org/resource/Rome'>

http://dbpedia.org/resource/Rome

</a>
</div>

Then I used Firebug with the NetExport extension and a shell script to gather the load time. The (raw) results are available online as well as two figures that give a rough idea of what is happening:

Note the following regarding the test setup: I did a local test (no network dependencies) with all caches turned off; the tests were performed with Firefox 3.6 on MacOS 10.5.8 (2.53 GHz Intel Core 2 Duo with 4GB/1067 MhZ DDR3 RAM on board). Each document had five runs, the numbers above show the averages over the runs.


Filed under: Experiment, Linked Data

Syndicated 2010-10-26 14:27:03 from Web of Data

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!