Older blog entries for dajobe (starting at number 13)

Redland and Raptor Bughunt

All praise valgrind! It finally managed to find where redland was crashing when used with PHP and one class. It was a combination of PHP and SWIG conspiring to do the wrong thing with a null string and NULL object pointer, a configuration error on my part.

This debugging was made much harder by the annoying things that are threads, which seem to be used more and more with useful shared libraries, causing debugging nightmares. Does anyone understand how to get gdb to do the right thing with this? I certainly don't, mostly facing a dead stack trace until the planets align right and it'll let me set a breakpoint in a shared-library that isn't loaded yet, let me run the code and stop at the breakpoint. Bah!

Anyway, onwards. Added some defensive code to try to catch this thing again and in the course of updating the debug code, found the __func__ pseudo-variable in C99 which is handy and can replace a lot of hand-coded bits.

Things seem to be building ok now, and the new MySQL backend for Redland is looking solid, so it's nearly time for another release. It always seems to take a week to do that, rather than sling the code out the door untested.

Planet RDF Last week I helped build Planet RDF based on existing code my friends had, with me mostly providing the additional hacking for fixing the mess that is HTML in RSS and the glue to make the thing update. It's looking good.

1 Nov 2003 (updated 1 Nov 2003 at 15:30 UTC) »

Cairo graphics debian packaging

In a fit of enthusiasm last weekend, I made debian packages (debs) of the Cairo project sources from CVS in order to get it building for Mono. Cairo is a vector graphics library for cross-device output intended to be similar to PDF 1.4; it was once called Xr.

I mentioned this to the Cairo people last week and, ta da, I'm now maintaining them and have CVS access. The debs for the cairo snapshots are presently hosted on freedesktop.org before they move to the main site, after some server reorganising. So if you add this area to /etc/apt.sources.list with

deb http://freedesktop.org/~cairo/debian/ ./

you can get a one-line install of the Cairo libraries without dependency hell.


Pushing the stack back to my original goal, Mono is now building for me from CVS, although I can't say I've tested the use of Cairo significantly, Monodoc using GTK# is working. Although building Mono from CVS is yet another story...

hacking life

And finally, as the #cairo channel on Freenode was discussing the hacker glider logo apparel (proceeds to EFF) I came up with the hacking life slogan for it. I think that works pretty well. The logos were, of course, made using Cairo.

20 Aug 2003 (updated 20 Aug 2003 at 19:48 UTC) »

Redland and Debian Packaging

Phew! My RDF/XML parser raptor (a C library, depends on libxml and cURL) has been in Debian sid/unstable for a few months now so it was time to attempt the big beast, redland. That's my main RDF system and as well as the C library, it has 6 other language interfaces - perl, python, ruby, java, tcl, php and I'm working on CLI/C#.

This list has felt a bit daunting for me to deal with, however after the raptor experience I was confident the C library part would at least be straightforward. Over the last 4 days I've added some of the languages slowly, while studying the mysteries of the Debian perl and python policies. To that I've added waiting for sid/unstable to be buildable so I can use pbuilder to check that I have all the correct dependencies, and decoding various packages to see how they did things.

So the current state is that I've got the C (and -dev), perl, python and ruby debian packages building without error in pbuilder, "lintian clean" and working once installed. The next step is to see about getting them into sid/unstable when my very busy sponsor edd has enough time to check them.

Raptor - a little tune up

I've been doing a bit of raptor tuning to get down the CPU and memory usage on large files. I was always afraid of premature optimisation and knew that things could be improved a lot if I did some profiling and cut down the big problems. The results on a 550,000 triple rdf/xml file went from 172.8s for Raptor 0.9.8 as released to 7.3s with the CVS sources - over 23x faster. The improvement was mostly due to:

  • A lot less strlen() on strings I already had the length for elsewhere
  • Removal of many short-lifetime malloc()/free() pairs (thanks to dmalloc)
  • using a set for rdf:ID checking, rather than a list.
18 Mar 2003 (updated 18 Mar 2003 at 23:31 UTC) »

Raptor and web libraries

Unlike in Java, Perl, Python and all those higher level languages, in C when you want to do something like retrieve a web page, there is a lot more to do. There aren't stdurl or stdweb libraries around that you can assume are always available. Since raptor is a parser for an XML language, libxml is one likely thing that is usable and it has a tiny HTTP implementation, sufficient for GET. There is the defacto portable web library libcURL and so I make that also configurable plus the W3C libwww which is common but rather large. So problem solved.

Or so I thought. It turns out that all those APIs except for the W3C libwww are push - they take the thread of control from the caller and return data to it via callbacks. However I wanted the more I/O stream-like pull i.e. the user application does while(...) { get stuff; do stuff }. You can wrap a push API around a pull one quite easily and efficiently, but not the other way around - you need to store all the pushed content then deliver it pull-by-pull. So, I'm going to have to live with that - provide both and warn users that the pull interface will suck up memory.

27 Feb 2003 (updated 27 Feb 2003 at 00:48 UTC) »

FOAFBot lives, nearly

I had another go at getting edd's FOAFbot running again with the updated Redland 0.9.12 API and it now seems to not crash as much after I hand-edited some bugs that it found in the latest version. This is a good thing since I found the bugs and fixed them. Of course that just leaves the issue of why it runs and doesn't work.

16 Feb 2003 (updated 16 Feb 2003 at 20:03 UTC) »

Redland and Raptor

What day is it? Oh yeah, I released some software last week - new versions of my redland and raptor RDF general and parser libraries respectively. Took only nine months since the last release of Redland, lots of bug fixing and testing. Plus all the pain of automake, autoconf, uploading to SourceForge, writing Freshmeat news, announcements, release notes, checking it works across solaris, linux, FreeBSD, OSX. The latter was a real pain - not quite UNIX enough for me. Then I made some RPMs and new debs.

So onwards to more releases. I've already got some patches received for win32 support for raptor and I wanted to move it on to the latest autotools, which seems to be working OK. For redland I'm ready to rip out more code such as the old repat parser, a bit obsolete.

I also just released that I've implemented SAX2 for C inside raptor since libxml and expat only ever had SAX1-style interfaces - no XML Namespaces. I should think what to do with this, maybe talk to DV.


At last, got another release of raptor out the door after about 5 months and lots of changes (new RDF datatyping, collections, URI updates, internal improvements, bug fixes). It took a while till it had the features I wanted to add, the changes I wanted to make internally and after all that, to make it stable and working.

Releasing software is such a pain, especially in the free software / open source world. Since in particular, it is your reputation that is being demonstrated - "show me the code!" - and the source code you make available to the world is what counts.

Of course, talking about reputation on advogato is just asking for trouble. :)


A short update on various code refactorings of my raptor RDF parser over the last month or so. I have been pulling apart an 140K C source file into chunks by functionality. This was needed for a bunch of reasons, mostly because it had been evolved rather than designed, and was getting embarassing to look at.

The result is that the API is smaller and more flexible and I can soon pull out the redland URI dependencies, so that the same raptor library can be used standalone in applications while working efficiently in redland.

Apart from moving the same code around I also got time to improve the XML error handling so that it can deal with XML it doesn't understand, so it now handles all the libxml XML tests without falling over. And killing a bunch of other bugs. The bug list is actually getting shorter.


A few weeks since last hacking update on my redland RDF library. I long wanted the (C) library to be able to return error messages and so on to the higher language interfaces (perl, python, java, tcl, ruby, ...). In this case, I've got my head around the C interfaces to callbacks for perl and python so that now any errors result in native perl and python subroutines / functions / exceptions working. This means that the CGI demo programs that used to just chuck errors to stderr can return them to the web interface.


Most recently I've mostly been banging on my raptor RDF parser and writing test cases to find the bugs that have been known about but not diagnosed. I also took the opportunity to try to split out some code from the very large main source file. Tricky doing this from evolved code.

4 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!