19 Jun 2006 DV   » (Master)

On broken XML processing with libxml2

The XML spec is very clear about this:

Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document's logical structure to the application in the normal way).

That's the default behaviour of libxml2, but for the purpose of helping recover occasionally corrupted data, I provided an XML_PARSE_RECOVER parser flag allowing to correct some of the trivial errors, for example if using xmllint --recover. Now the problem is that people start using this flag for default processing, and that's something I said many time I didn't want to happen. If blog generator can't output correct well formed XML, then they need to be fixed (and that's true for all other XML generators). The intent of the drastic rule in the XML spec is cristal clear, to avoid data corruption and data loss by forcing non conformant behaviour to be fixed at the source, not at the consumer level.

I am gonna give a hard time to those who abused this feature of libxml2, because while not completely disabling the option, I will make sure it's not used repeatedly, I'm still looking at the best way. Feedback welcome on this issue, preferably on the mailing-list.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!