Older blog entries for DV (starting at number 207)

libxml2 breakage

Sorry I broke CVs head for a day or so, the error didn't show up in my checkout because I compile statically :-\ . Currently going though the libxml2 bugzilla list trying to kill as many bug report as possible, one of them was a real parser bug !</b>


While I sit on IRC all day long, I don't use IM until now. I now have Gaim, I did a review and a small implementation of Jabber a few years ago, I'm very happy of the boost it will receive.

Journey in regexp land

I have been rather quiet in the last 10 days, first due to an extended week-end and then because I hit a relatively hard problem at the regexp level used in libxml2. Basically it's all XML Schemas fault's Kasimier nearly completed the support except the redefine feature allowing a schemas to subset the content model of a type exported by an imported schemas. Kasimier will as usual handle the nasty part of making sense of the spec and I will give him the basic tools to have this work. Which means I need to provide ways to check that a content model is a valid derivation of another, which in regexp terms can be sumarized by: does regexp R accepts all strings generated by regexp r.

And that is a rock, a hard one.

After reading quite a bit, first my existing automata + counters modelling of regexps that we use for content model validation is really not a good model to try to solve this (though it's good for validating instances). So back to the litterature, various papers, most of them relatively recents, especially the paper from Michael Sperberg-McQueen at Extreme Markup earlier this month on on using Brzozowski derivatives for the task. Looks fine except it will explode when using large counter ranges. My selected approach is to do the derivation at the algebraic level instead of doing step by step on all possible input strings, and to fallback to injecting token by token only when no progress can be made purely on tree constructs. A small week of frenetic testing and refinement I now have something which seems to work relatively nicely. I just pushed it into libxml2 , adding less than 8kB of code to xmlregexp compiled size, added a first set to regression testing and support at the testRegexp command line test:

paphio:~/XML -> ./testRegexp --expr '(a*, ((b, c, d){0,5}, e{0,1}){0,4}, f)' '(a{1,100}, b, (c, d, b){2,3}, c, d, e)'
Testing expr (a*, ((b, c, d){0,5}, e{0,1}){0,4}, f):
Subset parsed as: ((((a , b) , ((c , d) , b){2,3}) , c) , d) , e
Resulting derivation: (((b , c) , d){0,5} , e?){0,3} , f
Ops: 0 nodes, 55 cons
paphio:~/XML -> ./testRegexp --expr '(a|b),(a|c){0,100}' 'a{0,100},(a|c)'
Testing expr (a|b),(a|c){0,100}:
Subset parsed as: a{0,100} , (a | c)
Resulting nillable derivation: empty
Ops: 0 nodes, 11 cons
paphio:~/XML -> ./testRegexp --expr '(a|b){3,*}' '(a,b)+'
Testing expr (a|b){3,*}:
Subset parsed as: (a , b)+
Resulting derivation: (a | b)+
Ops: 0 nodes, 8 cons
paphio:~/XML -> ./testRegexp --expr '(a|b),(a|c){0,99}' 'a{0,100},(a|c)'
Testing expr (a|b),(a|c){0,99}:
Subset parsed as: a{0,100} , (a | c)
Resulting derivation: forbidden
Ops: 0 nodes, 9 cons

The key is to try to keep sub-linear performances, I really expect redefines to be used to restrict content models from unbounded sets to bounded and reordered ones for example (a|b)* into (a,b){1,100000} to avoid consumer of services to be DoS'ed, if you explode when validating this is just worse, this is a big problem as pointed recently in a large threads on xml-dev. Hence testRegexp logs the number of Cons i.e. how many time an intermediate expression node was generated (one of Brzozowski results is that this set is finite, but the goal is to keep it small :-).

Future work on this is to fix one potential problem left, apply it to Kasimier code when it's there, extend it to allow the full set of operators needed by Relax-NG and maybe rewrite the RNG validator on top of it. Not sure I will use it for validation in Schemas itself (apart for the Schemas compilation of course), as I prefer good old automatas rather than mutating trees during the validation phase.

Test suite

Very impressed by yesterday SVG test suite results from Uraeus. Looks excellent, congrats ! Now can you automate the process of finding defects in output, I started having an headache approximately 2/3rd in the scanning process :-)

9 Aug 2005 (updated 9 Aug 2005 at 22:13 UTC) »

a new gamin release 0.1.15

One of the strange feeling of becoming an old fart is seeing the youngsters come over your code and replace your slow patient process into a frenetic trance, though with less control. Basically what's happening to me on gamin as John McCutchan is going over gamin's code for inotify. Hence yet another release, and don't ask for 0.1.4 it just disapeared due to last minutes CVs updates.

libxml2 and Schemas

On the other hand I an regaining control over libxmlt Schemas development, Kasimier Buchcik slowed down a bit so I went back and fixed a number of core issues in the libxml2 automata and regexps code. The 2 last bug reports have been from data security companies using Schemas, interesting :-) . I didn't finished yet the schematron code, there is a couple of issues I need to fix first and I would prefer to get a test suite based on the ISO draft standard syntax, and still looking for something like that ...

sunset over the mountains

Sometimes I wonder why I am in Grenoble, away from most events, airport, and from beaches and coral reefs :-), but it usually takes a climb in the surrounding mountains to reassert my love for that area. Pictures of the sunset over Chartreuse on Sunday don't fully give justice to the amazing view (nor the wind or the cold...)

Javascrapt mess

I tried to make a small Javascript slideshow to help scanning my picture, okay I now understand why "web programming" is such a pityful disaster, global variables from the same block of code unreachable at invocation time, a perfect broken kludge for timers loosing all context, I'm sorry for the armies of web developpers worldwide, some "designers" should be hung without much formal process...

DesktopConf and OLS

I'm a bit exhausted after nearly a week of conferences. Very good to (re)connect face to face with other people, the set of talks have been quite good too, today for example have been completely focused on Xen and virtualizattion, but a week in a row is a bit too much. And we still have a Fedora BOF at 9pm !


I still managed to do some code between the conferences, I fixed some of the libxml2 automat/regexps limitations that blocked Kasimier on XML Schemas. I also started implementing the Schematron validation draft ISO standard, it's both relatively simple, and powerful, it's a good complement to XSD or Relax-NG especially for integrity contraints at the document level, but it can also be a validation framework easier to use for people who are not XML gurus. Since it's mostly based on XPath the code on top of existing functionalities should be relatively small in term of code size, which is good too.

Quotes of the day

Damien while honeymooning: "Jonita will probably kill me. I found the bug and a fix ..."

Rik van Riel: "Do you want a VM that is consistently slow, or one that is occasionally fast ? ;)"

Yeah it's Bastille Day and I'm easilly amused today :-)


Following inotify being merged in Linus tree, I had to make a new release of gamin quickly, they changed the kernel API for it again, oh and they promised they would not change it ... again too, but it's fine, I'm quite happy, thanks rlove and Co. ! Gamin-0.1.2 includes the support for the new kernel API thanks to McCutchan, it's availble as testing updates on Fedora Core 3 and 4, but a kernel with inotify may take a little bit.

However it is clear that dnotify is now legacy, further work will be on the inotify back-end, well once I have a kernel for it :-)


2.6.20 release seems a good one, no loud complains, good ! The good point of a official holliday is that I can go chase libxml2 bugs without thinking I should do something else. Today was relatively productive, I got a number of patches post 2.6.20 release (it's interesting to see how each new release tend to generate momentum and you get new comers and new fixes/patches even if unrelated to the relase itself) applied and fixed some of the bugs which were popping up in NIST XML Schemas regression tests:

## NIST test suite for Schemas version NIST2004-01-14
Ran 23170 tests (3953 schemata), no errors

Note for thomasv, "make tests" for libxml2 and libxslt have had a "make valgrind" target associated (running way slower) for more than a year ;-)

We still have a number of XSD test failing however some in the Sun part of the W3C regression suite but most of them in the Microsoft part, problem is to understand what is actually happening, who is right, unfortunately the spec is very very hard to understand, it is not always a clear cut.

DesktopCon and OLS

I'm leaving to Ottawa tomorrow but I will be arriving Sunday evening.

Tour de France ... from the 18th floor

Le Tour de France happened to start from Grenoble today, and pass just below my flat. A tad bit disapointing, there is way way more cars than bicycles, crap advertizing, throwing of cheap goodies to people waiting on the side, noise, ads, and for 30 seconds the actual sport event. People end up seeing way more about ads for sausages, cars, watches, candies, banks than cycling in action... I guess it's the fate for all very popular sport events, it's way more about advertizing and business than sport, same for the Olympics :-) . Anyway I took 3 pictures from the balcony, but i you're really a bicycle fan, watch it on TV !

Afterthough from Havoc entry about Metacity hacking

Seems to me that Havoc's main point was that after the code is mature enough, fixing remaining bugs would just increase the risk of larger bugs being added in the process. I think one can control those risks by adding regression tests for most common uses of the software, and if I understand correctly people are working on GUI regression testing tools for Gnome, so to me the best way to revisit that problem is to start accumulating GUI behaviour regression tests for the desktop.

Legal Spam

So who else got that legal SPAM :

Greetings $project Maintainer:

We currently use the $project package and we are quite pleased with it,
but our legal council has recommended that we discontinue its use. ...

And then 2 Word document with list of files, generated, test suites, etc...
missing a Copyright label. Apparently I'm not the only one, so if you got this
don't worry, you are not the only one. My answer was basically that such
issues should be made public if such legal changes was needed to an
Open Source project, I wondering how they are gonna react :-)


Getting that release out had been harder than usual, but this is normal considering:

  • That the last release was beginning of April though I usually release every months
  • The large amount of changes and testing which afected packaging

Technically the big progresses are on the XSD Schemas support (streaming and big improvements in the conformance), there is also revamped testing tools purely based on C to be able to test fully on Windows, valgrind or embedded systems, and some DOM "import" like functions for the tree APIs.

Fun with rpmfind.net

The machine had been super stable since I upgraded it last summer, uptime around 200 days when one of the drives died, the system survived but I had to reboot still and needed to increase drive space, expecially Fedora and Red Hat mirrors were starting to outgrow the largest 120GB drives I could plug on the venerable 3ware card, so clearly it was time to start adding SATA especially since 200MB drives are now the cheapest per capacity I bought a couple. That's where things started to be nasty: the Promise TX4200 wasn't supported by RHEL3 or 4, I didn't trust the drivers coming from the vendor either, so I bought a cheapo SIL based SATA card, which worked to some extend on RHEL3 but depending on the kernel version I got either crashes or non-dma transfers and kernel errrors. It was clearly time for a software ugrade too, and since the box don't have a CD drive (it's all disks, 11 drives in that box), the upgrade was a little funky (rsync to a new drive at home, local upgrade to RHEL 4, and then swapping the new root drive later on-site). Anyway it works very well now, no partition is full anymore, there is now more than a terabyte of storage, twice more RAM, and I hope I will get to a 200 days+ uptime again ! Knocking on wood !

Software Patents in Europe

At least the crap at the EC level got blocked, thanks a million to all who worked on this, including the FFII and our representative (my MEP is Rocard, I'm fairly happy to have voted for him !). As Calum pointed out it is nice to see Sun Microsystems lining up with us on this, they are still members of EICTA but dissenting,
excellent ! As for the block on rpmfind.net I simply removed it 2 hours after the results for the vote ;-) (rpmfind.net was down earlier today, changed memory, added 2 sata drives and it broke during the night apparently, the good point is that I now have a 200GB drive for Fedora). Now it's time for me to celebrate !

3 Jul 2005 (updated 3 Jul 2005 at 14:23 UTC) »

Software Patents in Europe

I feel threatened by this. Frustrated too as IBM, Apple, Sun, HP, Nokia ... who are paying for the pro-patents lobbyist in Brussels also use libxml2 and libxslt heavilly one way or another, I feel backstabbed by the entities I made a big gift to. I am starting to look at ways to protect myself, and since the only handle I have in this fight is my Copyright ownership for a large part of the code I may change the License to protect myself agaisnt patent infrigement lawsuits in the future, I'm not sure adding a clause to the MIT License I use now makes sense, I have been pointed by one of the other listed authors of libxml2/libxslt to the Academic Free License from Lawrence E. Rosen and looks very interesting. I hope I won't have to go though this, it will cost me a bit, and it will cost way more to everybody using the software to recertify it for their use :-( .

Taking things personally

I have been trying to analyze why I react so personally to things related to my libxml2 or libxslt and to a lower extend to other parts of my GNOME work. My conclusion at this point is that I invested an awful lot of my life, not just work time, in that code base and project. If I die tomorrow that would be the only think significant left of my stay around, this is a sad in a sense, but can also explain why I'm so sensitive about this.

Pink Floyd

I happened by chance to see their performance at Live8 on TV yesterday, it was amazing, I could not block a few tears, I was moved and could not resist, incredibly powerful ...

198 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!