Older blog entries for dajobe (starting at number 11)

20 Aug 2003 (updated 20 Aug 2003 at 19:48 UTC) »

Redland and Debian Packaging

Phew! My RDF/XML parser raptor (a C library, depends on libxml and cURL) has been in Debian sid/unstable for a few months now so it was time to attempt the big beast, redland. That's my main RDF system and as well as the C library, it has 6 other language interfaces - perl, python, ruby, java, tcl, php and I'm working on CLI/C#.

This list has felt a bit daunting for me to deal with, however after the raptor experience I was confident the C library part would at least be straightforward. Over the last 4 days I've added some of the languages slowly, while studying the mysteries of the Debian perl and python policies. To that I've added waiting for sid/unstable to be buildable so I can use pbuilder to check that I have all the correct dependencies, and decoding various packages to see how they did things.

So the current state is that I've got the C (and -dev), perl, python and ruby debian packages building without error in pbuilder, "lintian clean" and working once installed. The next step is to see about getting them into sid/unstable when my very busy sponsor edd has enough time to check them.

Raptor - a little tune up

I've been doing a bit of raptor tuning to get down the CPU and memory usage on large files. I was always afraid of premature optimisation and knew that things could be improved a lot if I did some profiling and cut down the big problems. The results on a 550,000 triple rdf/xml file went from 172.8s for Raptor 0.9.8 as released to 7.3s with the CVS sources - over 23x faster. The improvement was mostly due to:

  • A lot less strlen() on strings I already had the length for elsewhere
  • Removal of many short-lifetime malloc()/free() pairs (thanks to dmalloc)
  • using a set for rdf:ID checking, rather than a list.
18 Mar 2003 (updated 18 Mar 2003 at 23:31 UTC) »

Raptor and web libraries

Unlike in Java, Perl, Python and all those higher level languages, in C when you want to do something like retrieve a web page, there is a lot more to do. There aren't stdurl or stdweb libraries around that you can assume are always available. Since raptor is a parser for an XML language, libxml is one likely thing that is usable and it has a tiny HTTP implementation, sufficient for GET. There is the defacto portable web library libcURL and so I make that also configurable plus the W3C libwww which is common but rather large. So problem solved.

Or so I thought. It turns out that all those APIs except for the W3C libwww are push - they take the thread of control from the caller and return data to it via callbacks. However I wanted the more I/O stream-like pull i.e. the user application does while(...) { get stuff; do stuff }. You can wrap a push API around a pull one quite easily and efficiently, but not the other way around - you need to store all the pushed content then deliver it pull-by-pull. So, I'm going to have to live with that - provide both and warn users that the pull interface will suck up memory.

27 Feb 2003 (updated 27 Feb 2003 at 00:48 UTC) »

FOAFBot lives, nearly

I had another go at getting edd's FOAFbot running again with the updated Redland 0.9.12 API and it now seems to not crash as much after I hand-edited some bugs that it found in the latest version. This is a good thing since I found the bugs and fixed them. Of course that just leaves the issue of why it runs and doesn't work.

16 Feb 2003 (updated 16 Feb 2003 at 20:03 UTC) »

Redland and Raptor

What day is it? Oh yeah, I released some software last week - new versions of my redland and raptor RDF general and parser libraries respectively. Took only nine months since the last release of Redland, lots of bug fixing and testing. Plus all the pain of automake, autoconf, uploading to SourceForge, writing Freshmeat news, announcements, release notes, checking it works across solaris, linux, FreeBSD, OSX. The latter was a real pain - not quite UNIX enough for me. Then I made some RPMs and new debs.

So onwards to more releases. I've already got some patches received for win32 support for raptor and I wanted to move it on to the latest autotools, which seems to be working OK. For redland I'm ready to rip out more code such as the old repat parser, a bit obsolete.

I also just released that I've implemented SAX2 for C inside raptor since libxml and expat only ever had SAX1-style interfaces - no XML Namespaces. I should think what to do with this, maybe talk to DV.


At last, got another release of raptor out the door after about 5 months and lots of changes (new RDF datatyping, collections, URI updates, internal improvements, bug fixes). It took a while till it had the features I wanted to add, the changes I wanted to make internally and after all that, to make it stable and working.

Releasing software is such a pain, especially in the free software / open source world. Since in particular, it is your reputation that is being demonstrated - "show me the code!" - and the source code you make available to the world is what counts.

Of course, talking about reputation on advogato is just asking for trouble. :)


A short update on various code refactorings of my raptor RDF parser over the last month or so. I have been pulling apart an 140K C source file into chunks by functionality. This was needed for a bunch of reasons, mostly because it had been evolved rather than designed, and was getting embarassing to look at.

The result is that the API is smaller and more flexible and I can soon pull out the redland URI dependencies, so that the same raptor library can be used standalone in applications while working efficiently in redland.

Apart from moving the same code around I also got time to improve the XML error handling so that it can deal with XML it doesn't understand, so it now handles all the libxml XML tests without falling over. And killing a bunch of other bugs. The bug list is actually getting shorter.


A few weeks since last hacking update on my redland RDF library. I long wanted the (C) library to be able to return error messages and so on to the higher language interfaces (perl, python, java, tcl, ruby, ...). In this case, I've got my head around the C interfaces to callbacks for perl and python so that now any errors result in native perl and python subroutines / functions / exceptions working. This means that the CGI demo programs that used to just chuck errors to stderr can return them to the web interface.


Most recently I've mostly been banging on my raptor RDF parser and writing test cases to find the bugs that have been known about but not diagnosed. I also took the opportunity to try to split out some code from the very large main source file. Tricky doing this from evolved code.


I've been reading Linux kernel changes related to updating for C99 Designated Initializers (also called named struct initialisers). Does anyone know why these are a good idea and why you'd want to change / break your code to support these?

They work like this. A struct with two fields:

struct type_foo bar = {

is now (or can now be) written as

struct type_foo bar = {
  .field1 = val1,
  .field2 = val2

and I was just getting used to ignoring K&R C support ;)

3 Aug 2002 (updated 3 Aug 2002 at 20:30 UTC) »

I've mostly been working on the redland iterator changes needed for contexts that edd wanted. The current state is 'make check' works fine, except for memory leak, which I'm still hunting. I tried valgrind, but it gave false info which took yet more time to work that out.

After I get this fixed this, I'll have to consider changing streams to match this new model, however I'm a little worried since it may cause too much user-level change. Iterators are pretty internal and hardly leak out for non-C.

The changes for streams, like iterators are as follows: The two methods on stream are presently: is_end and get_next. is_end will remain the same, but get_next splits into get and next. It may be possible to hide this outside C. The general change is from:

while !stream.is_end { s=stream.get_next;blah blah }


while !stream.is_end { s=stream.get; blah blah; stream.next() }

where the 's' is a shared pointer in C but could be a copy outside. After stream.next() the pointer moves and isn't valid.

(actually this makes things more efficent, there is less copying inside redland and things should work faster)

2 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!