The Wayback Machine - https://web.archive.org/web/20170630023512/http://www.advogato.org/person/cdfrey/diary/52.html

22 Feb 2009 cdfrey   » (Journeyer)

apenwarr has written an excellent rebuttal to my original rebuttal. I'd like to clarify my different viewpoint.

I think the crux of the argument boils down to these two statements:

    Strict receiver-side validation doesn't actually improve interoperability, ever.

and

    If you didn't catch it, the precise error in cdfrey's argument is this: You don't create a new file format by parsing wrong. You create a new file format by producing wrong.

I obviously disagree with both of these statements, but I understand how you could think they were true.

On the surface, parsing doesn't seem to create a new format, but even in Avery's own example, the majority of browsers accepting an incorrectly quoted option have indeed created a new format. It isn't a documented format. It is actually an anti-documented format, because the spec says it is wrong. But anyone writing a browser today would not be able to merely follow the specs and produce a functional browser... they would have to follow the behaviour of every other browser in existence as well. Just ask the developers of the now long-defunct, but exciting, Project Mnemonic.

Now, obviously this makes it easier for Average Joe to write his own webpage, and it probably did help advance the popularity of HTML and the rise of the web. But there is a defacto HTML standard out there because parsers were not strict enough. I don't know how you can deny that. (Part of the problem was that browsers were developed alongside the spec, so that contributed a lot too. The poor spec didn't stand a chance.) :-)

And strict receiver-side validation does improve interoperability. Can you imagine if the average C++ compiler allowed a relaxed syntax? Suddenly compiling code that "works" on one platform would not work on another. I admit that this is already a problem to a smaller extent than HTML, but differences in validating the C++ language spec between compilers is seen as a bug in the compiler, and rightly so.

This raises an interesting Option 4, to add to Avery's list: let the parser be forgiving with unambiguous syntax, but warn loudly.

This would be a huge improvement to what we have today. We need more web browsers that report the number of HTML errors in a page, by default, in the status bar. And it should be hard to disable, so that a site's non-compliance is widely seen and scorned. (And yes, some of my web pages would be scorned too.)

I believe Opera has a feature like this, if memory serves.

As for the side note claiming that parsing is not the problem, but the rendering is, my argument was based on XML as a data exchange format, and less as a way to display content in a browser. For example, the opensync project uses XML for data interchange, and plugin config. These formats are defined in strict schemas, which are tested and used via libraries like libxml2, which I assume falls into Avery's Big XML Parser category. If these schemas were not correct, I would consider it a bug detrimental to future interoperatility, and something that should be fixed.

The web itself is already so goofy that trying to apply XML to it now is like nailing jello to the wall. So in that respect, I can understand Avery's pain. It just bothered me that someone was boasting about an incomplete parser and claiming they interoperate with XML better than the big libraries. It seemed to me that he was discounting the long-range goals at work in XML in order to avoid some short-term pain. Of course, in the real world of making the customer happy, such shortcuts are often needed, but it's something to hide away in that closet of programming hacks, not published as an example on the web. That sounds harsher than I mean it, but I like those long-range goals, and XML is a solid technical achievement in its own right, even if rather cumbersome. Hey, I'm a C++ programmer... I like strict syntax. :-) I believe strict syntax promotes accuracy, and that accuracy helps you down the road when projects get larger and more complex.

In some ways, you could say that the inaccuracy found all over the web results in an unstable foundation that is generally holding the web back from greater things.

I hope it is also clear that I'm not a fan of XML. (Referring to it as a monstrosity was probably a good hint.) It has its place, but I think a lot can be accomplished by drastically simpler documented formats, and I'm quite willing to hack up my own simple file format if I think it is appropriate. I just don't call it XML.

I have some thoughts percolating in my head about Postel's Law, but that will have to wait for another post.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!