apenwarr
has written an excellent rebuttal to my original rebuttal.
I'd like to clarify my different viewpoint.
I think the crux of the argument boils down to these two
statements:
Strict receiver-side validation doesn't actually improve
interoperability, ever.
and
If you didn't catch it, the precise error in cdfrey's
argument is this: You don't create a new file format by
parsing wrong. You create a new file format by producing
wrong.
I obviously disagree with both of these statements, but I
understand how you could think they were true.
On the surface, parsing doesn't seem to create a new format,
but even in Avery's own example, the majority of browsers
accepting an incorrectly quoted option have indeed created a
new format. It isn't a documented format. It is actually
an anti-documented format, because the spec says it is
wrong. But anyone writing a browser today would not be able
to merely follow the specs and produce a functional
browser... they would have to follow the
behaviour of every other browser in existence as well. Just
ask the developers of the now long-defunct, but exciting,
Project Mnemonic.
Now, obviously this makes it easier for Average Joe to write
his own webpage, and it probably did help advance the popularity
of HTML and the rise of the web. But there is a defacto
HTML standard out there because parsers were not strict
enough. I don't know how you can deny that. (Part of the
problem was that browsers were developed alongside the spec,
so that contributed a lot too. The poor spec didn't stand a
chance.) :-)
And strict receiver-side validation does improve
interoperability. Can you imagine if the average C++
compiler allowed a relaxed syntax? Suddenly compiling code
that "works" on one platform would not work on another. I
admit that this is already a problem to a smaller extent
than HTML, but differences in
validating the C++ language spec between compilers is seen
as a bug in the compiler, and rightly so.
This raises an interesting Option 4, to add to Avery's list:
let the parser be forgiving with unambiguous syntax, but
warn loudly.
This would be a huge improvement to what we have today. We
need more web browsers that report the number of HTML errors
in a page, by default, in the status bar. And it should be
hard to disable, so that a site's non-compliance is widely
seen and scorned. (And yes, some of my web pages would be
scorned too.)
I believe Opera has a feature like this,
if memory serves.
As for the side note claiming that parsing is not the
problem, but the rendering is, my argument was based on XML
as a data exchange format, and less as a way to display
content in a browser. For example, the opensync project
uses XML for data interchange, and plugin config. These
formats are defined in strict schemas, which are tested
and used via libraries like libxml2, which I assume falls
into Avery's Big XML Parser category. If these schemas were
not correct, I would consider it a bug detrimental to future
interoperatility, and something that should be fixed.
The web itself is already so goofy that trying to apply XML
to it now is like nailing jello to the wall. So in
that respect, I can understand Avery's pain. It just
bothered me that someone was boasting about an incomplete
parser and claiming they interoperate with XML better than
the big libraries. It seemed to me
that he was discounting
the long-range goals at work in XML in order to avoid some
short-term pain. Of course, in the real world of making the
customer happy, such shortcuts are often needed, but it's
something to hide away in that closet of programming hacks,
not published as an example on the web. That sounds harsher
than I mean it, but I like those long-range goals, and XML
is a solid technical achievement in its own right, even if
rather cumbersome. Hey, I'm a C++ programmer... I like
strict syntax. :-) I believe strict syntax promotes
accuracy, and that accuracy helps you down the road when
projects get larger and more complex.
In some ways, you could say that the inaccuracy found all
over the web results in an unstable foundation that is
generally holding the web back from greater things.
I hope it is also clear that I'm not a fan of XML.
(Referring to it as a monstrosity was probably a good hint.)
It has
its place, but I think a lot can be accomplished by
drastically simpler documented formats, and I'm quite
willing to hack up my own simple file format if I think it
is appropriate. I just don't call it XML.
I have some thoughts percolating in my head about Postel's
Law, but that will have to wait for another post.