Older blog entries for DV (starting at number 209)

5 Sep 2005 (updated 5 Sep 2005 at 12:17 UTC) »

libxml2-2.6.21 and libxslt-1.1.15

Finally made a new set of releases, I have been chasing bugs for the last weeks, so people should really update in general. As a result I ended up closing 182 bugs in GNOME bugzilla (manually because the "handle multiple bugs at once" form does not allow to change to CLOSED state so I lost one hour clicking manually on the bug forms :-( ), anyway it's good to have a libxml2 and libxslt bug lists trimmed down to something reasonnable. It's good too to have them in time for the new GNOME release too.

Upcoming events

Don't forget to register for the GNOME Summit if you are coming, also tell us what you want to work on. This clearly won't be a talk driven conference but setting topics of work and discussion in advance will make us more productive.

My talk on Xen at FUDcon is scheduled for Thursday 6th October, I will try to be a bit on the GNOME booth too before and after on that day.

Disaster recovery

Without getting down into the political side, I'm still surprized it takes such a long time for countries affected by a disaster to request or even accept international aid. Seeing the US take a full week before accepting the various kind of help offered worldwide is a bit shocking, it's not like they didn't know they needed it, is that a logistic problem ? But when the Red Cross, an international foreign non-govenemental agency appears to be the most efficient workforce on the scene for 5 full days, and that even the state governemt officiel officially recognize it, this really means the money going to governemental disaster planning or handling should be directly donated to those who know and care about handling those. This also mean the governement is incompetent to handle those. Point is that this fact would not surprize us coming from a less developped country (thailand post tsunami recovery effort looks far more organized in retrospect than US response to Katerina) but again shocking for a nation very eager to point to others how they think a country should be run...

26 Aug 2005 (updated 26 Aug 2005 at 12:56 UTC) »

the future of gamin

John has submitted a patch to have gnome-vfs bypass FAM/gamin and access inotify directly, and people have been asking what my opinion was. Basically it's just fine, there is only 2 concerns: first the OS portability, gamin/FAM back-end for notification should be maintained as a way to keep working on older systems, BSD, MacOS-X, Solaris, etc., second the switch between inotify and gamin should be done at run-time, the reasons are that gamin/fam is legacy and will continue to be around, and by doing so we don't introduce one more binary incomptatibility in the platform, it's hard enough already for ISV shipping on Linux.

But It should be clear too that I find FAM a very ugly, limited and completely underspecified (if specified at all!) API, that it should die as promptly as possible and its only vertue was to be slighly less broken and specific than the various default kernel APIs found on various OSes especailly dnotify ! Die FAM, die ! The ideal situation would be to have a sane POSIX standardized notification API, and just rely on a kernel impleemtation based on syscalls, but I would not hold my breath ! The best future would be for gamin to become a legacy and useless piece of code.


I hacked furiously on libxml2 again this week, first trying to address as many bug reported as possible before the next release (probably at the end of next week) and also found a way to reduce the memory allocator usage of the library which can lead to very significant speedups in some cases as the parsing speed for the database file I use to profile raised from 21MBytes/s to 25MBytes/s on 32 bits and I would expect greater improvements when running on 64bits systems. Kasimier seems to have much work so the progresses on XML Schemas are slowing down, I should use that to also finish the schematron implementation and adding an interface to validate DTD at the SAX level like the interface for SAX and XSD added in July, this should be close to trivial with the existing code.


I will be out of reach until tuesday evening, as I was called to cleanup the land of my mother, there is an alarming number of wildfire around in the South of france and preventive actions is urgently needed. I also discovered that there is frequent Ryanair flights from there to London, which I will use when going to FUDCon3 where I will be speaking about Xen again. I will also go to the GNOME Summit at MIT in Cambridge MA the following week-end 8-11 October.

libxml2 breakage

Sorry I broke CVs head for a day or so, the error didn't show up in my checkout because I compile statically :-\ . Currently going though the libxml2 bugzilla list trying to kill as many bug report as possible, one of them was a real parser bug !</b>


While I sit on IRC all day long, I don't use IM until now. I now have Gaim, I did a review and a small implementation of Jabber a few years ago, I'm very happy of the boost it will receive.

Journey in regexp land

I have been rather quiet in the last 10 days, first due to an extended week-end and then because I hit a relatively hard problem at the regexp level used in libxml2. Basically it's all XML Schemas fault's Kasimier nearly completed the support except the redefine feature allowing a schemas to subset the content model of a type exported by an imported schemas. Kasimier will as usual handle the nasty part of making sense of the spec and I will give him the basic tools to have this work. Which means I need to provide ways to check that a content model is a valid derivation of another, which in regexp terms can be sumarized by: does regexp R accepts all strings generated by regexp r.

And that is a rock, a hard one.

After reading quite a bit, first my existing automata + counters modelling of regexps that we use for content model validation is really not a good model to try to solve this (though it's good for validating instances). So back to the litterature, various papers, most of them relatively recents, especially the paper from Michael Sperberg-McQueen at Extreme Markup earlier this month on on using Brzozowski derivatives for the task. Looks fine except it will explode when using large counter ranges. My selected approach is to do the derivation at the algebraic level instead of doing step by step on all possible input strings, and to fallback to injecting token by token only when no progress can be made purely on tree constructs. A small week of frenetic testing and refinement I now have something which seems to work relatively nicely. I just pushed it into libxml2 , adding less than 8kB of code to xmlregexp compiled size, added a first set to regression testing and support at the testRegexp command line test:

paphio:~/XML -> ./testRegexp --expr '(a*, ((b, c, d){0,5}, e{0,1}){0,4}, f)' '(a{1,100}, b, (c, d, b){2,3}, c, d, e)'
Testing expr (a*, ((b, c, d){0,5}, e{0,1}){0,4}, f):
Subset parsed as: ((((a , b) , ((c , d) , b){2,3}) , c) , d) , e
Resulting derivation: (((b , c) , d){0,5} , e?){0,3} , f
Ops: 0 nodes, 55 cons
paphio:~/XML -> ./testRegexp --expr '(a|b),(a|c){0,100}' 'a{0,100},(a|c)'
Testing expr (a|b),(a|c){0,100}:
Subset parsed as: a{0,100} , (a | c)
Resulting nillable derivation: empty
Ops: 0 nodes, 11 cons
paphio:~/XML -> ./testRegexp --expr '(a|b){3,*}' '(a,b)+'
Testing expr (a|b){3,*}:
Subset parsed as: (a , b)+
Resulting derivation: (a | b)+
Ops: 0 nodes, 8 cons
paphio:~/XML -> ./testRegexp --expr '(a|b),(a|c){0,99}' 'a{0,100},(a|c)'
Testing expr (a|b),(a|c){0,99}:
Subset parsed as: a{0,100} , (a | c)
Resulting derivation: forbidden
Ops: 0 nodes, 9 cons

The key is to try to keep sub-linear performances, I really expect redefines to be used to restrict content models from unbounded sets to bounded and reordered ones for example (a|b)* into (a,b){1,100000} to avoid consumer of services to be DoS'ed, if you explode when validating this is just worse, this is a big problem as pointed recently in a large threads on xml-dev. Hence testRegexp logs the number of Cons i.e. how many time an intermediate expression node was generated (one of Brzozowski results is that this set is finite, but the goal is to keep it small :-).

Future work on this is to fix one potential problem left, apply it to Kasimier code when it's there, extend it to allow the full set of operators needed by Relax-NG and maybe rewrite the RNG validator on top of it. Not sure I will use it for validation in Schemas itself (apart for the Schemas compilation of course), as I prefer good old automatas rather than mutating trees during the validation phase.

Test suite

Very impressed by yesterday SVG test suite results from Uraeus. Looks excellent, congrats ! Now can you automate the process of finding defects in output, I started having an headache approximately 2/3rd in the scanning process :-)

9 Aug 2005 (updated 9 Aug 2005 at 22:13 UTC) »

a new gamin release 0.1.15

One of the strange feeling of becoming an old fart is seeing the youngsters come over your code and replace your slow patient process into a frenetic trance, though with less control. Basically what's happening to me on gamin as John McCutchan is going over gamin's code for inotify. Hence yet another release, and don't ask for 0.1.4 it just disapeared due to last minutes CVs updates.

libxml2 and Schemas

On the other hand I an regaining control over libxmlt Schemas development, Kasimier Buchcik slowed down a bit so I went back and fixed a number of core issues in the libxml2 automata and regexps code. The 2 last bug reports have been from data security companies using Schemas, interesting :-) . I didn't finished yet the schematron code, there is a couple of issues I need to fix first and I would prefer to get a test suite based on the ISO draft standard syntax, and still looking for something like that ...

sunset over the mountains

Sometimes I wonder why I am in Grenoble, away from most events, airport, and from beaches and coral reefs :-), but it usually takes a climb in the surrounding mountains to reassert my love for that area. Pictures of the sunset over Chartreuse on Sunday don't fully give justice to the amazing view (nor the wind or the cold...)

Javascrapt mess

I tried to make a small Javascript slideshow to help scanning my picture, okay I now understand why "web programming" is such a pityful disaster, global variables from the same block of code unreachable at invocation time, a perfect broken kludge for timers loosing all context, I'm sorry for the armies of web developpers worldwide, some "designers" should be hung without much formal process...

DesktopConf and OLS

I'm a bit exhausted after nearly a week of conferences. Very good to (re)connect face to face with other people, the set of talks have been quite good too, today for example have been completely focused on Xen and virtualizattion, but a week in a row is a bit too much. And we still have a Fedora BOF at 9pm !


I still managed to do some code between the conferences, I fixed some of the libxml2 automat/regexps limitations that blocked Kasimier on XML Schemas. I also started implementing the Schematron validation draft ISO standard, it's both relatively simple, and powerful, it's a good complement to XSD or Relax-NG especially for integrity contraints at the document level, but it can also be a validation framework easier to use for people who are not XML gurus. Since it's mostly based on XPath the code on top of existing functionalities should be relatively small in term of code size, which is good too.

Quotes of the day

Damien while honeymooning: "Jonita will probably kill me. I found the bug and a fix ..."

Rik van Riel: "Do you want a VM that is consistently slow, or one that is occasionally fast ? ;)"

Yeah it's Bastille Day and I'm easilly amused today :-)


Following inotify being merged in Linus tree, I had to make a new release of gamin quickly, they changed the kernel API for it again, oh and they promised they would not change it ... again too, but it's fine, I'm quite happy, thanks rlove and Co. ! Gamin-0.1.2 includes the support for the new kernel API thanks to McCutchan, it's availble as testing updates on Fedora Core 3 and 4, but a kernel with inotify may take a little bit.

However it is clear that dnotify is now legacy, further work will be on the inotify back-end, well once I have a kernel for it :-)


2.6.20 release seems a good one, no loud complains, good ! The good point of a official holliday is that I can go chase libxml2 bugs without thinking I should do something else. Today was relatively productive, I got a number of patches post 2.6.20 release (it's interesting to see how each new release tend to generate momentum and you get new comers and new fixes/patches even if unrelated to the relase itself) applied and fixed some of the bugs which were popping up in NIST XML Schemas regression tests:

## NIST test suite for Schemas version NIST2004-01-14
Ran 23170 tests (3953 schemata), no errors

Note for thomasv, "make tests" for libxml2 and libxslt have had a "make valgrind" target associated (running way slower) for more than a year ;-)

We still have a number of XSD test failing however some in the Sun part of the W3C regression suite but most of them in the Microsoft part, problem is to understand what is actually happening, who is right, unfortunately the spec is very very hard to understand, it is not always a clear cut.

DesktopCon and OLS

I'm leaving to Ottawa tomorrow but I will be arriving Sunday evening.

Tour de France ... from the 18th floor

Le Tour de France happened to start from Grenoble today, and pass just below my flat. A tad bit disapointing, there is way way more cars than bicycles, crap advertizing, throwing of cheap goodies to people waiting on the side, noise, ads, and for 30 seconds the actual sport event. People end up seeing way more about ads for sausages, cars, watches, candies, banks than cycling in action... I guess it's the fate for all very popular sport events, it's way more about advertizing and business than sport, same for the Olympics :-) . Anyway I took 3 pictures from the balcony, but i you're really a bicycle fan, watch it on TV !

Afterthough from Havoc entry about Metacity hacking

Seems to me that Havoc's main point was that after the code is mature enough, fixing remaining bugs would just increase the risk of larger bugs being added in the process. I think one can control those risks by adding regression tests for most common uses of the software, and if I understand correctly people are working on GUI regression testing tools for Gnome, so to me the best way to revisit that problem is to start accumulating GUI behaviour regression tests for the desktop.

Legal Spam

So who else got that legal SPAM :

Greetings $project Maintainer:

We currently use the $project package and we are quite pleased with it,
but our legal council has recommended that we discontinue its use. ...

And then 2 Word document with list of files, generated, test suites, etc...
missing a Copyright label. Apparently I'm not the only one, so if you got this
don't worry, you are not the only one. My answer was basically that such
issues should be made public if such legal changes was needed to an
Open Source project, I wondering how they are gonna react :-)


Getting that release out had been harder than usual, but this is normal considering:

  • That the last release was beginning of April though I usually release every months
  • The large amount of changes and testing which afected packaging

Technically the big progresses are on the XSD Schemas support (streaming and big improvements in the conformance), there is also revamped testing tools purely based on C to be able to test fully on Windows, valgrind or embedded systems, and some DOM "import" like functions for the tree APIs.

Fun with rpmfind.net

The machine had been super stable since I upgraded it last summer, uptime around 200 days when one of the drives died, the system survived but I had to reboot still and needed to increase drive space, expecially Fedora and Red Hat mirrors were starting to outgrow the largest 120GB drives I could plug on the venerable 3ware card, so clearly it was time to start adding SATA especially since 200MB drives are now the cheapest per capacity I bought a couple. That's where things started to be nasty: the Promise TX4200 wasn't supported by RHEL3 or 4, I didn't trust the drivers coming from the vendor either, so I bought a cheapo SIL based SATA card, which worked to some extend on RHEL3 but depending on the kernel version I got either crashes or non-dma transfers and kernel errrors. It was clearly time for a software ugrade too, and since the box don't have a CD drive (it's all disks, 11 drives in that box), the upgrade was a little funky (rsync to a new drive at home, local upgrade to RHEL 4, and then swapping the new root drive later on-site). Anyway it works very well now, no partition is full anymore, there is now more than a terabyte of storage, twice more RAM, and I hope I will get to a 200 days+ uptime again ! Knocking on wood !

200 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!