24 Apr 2003 simonstl   » (Master)

I just got back from NYC, where I presented on the new Microsoft Office XML stuff. Much of the presentation was demo, so the slides are hardly a complete picture, but there's a rough outline there. I'd be curious to hear if anyone's interested in this stuff - there's sort of a "free love" opportunity here, if not free beer or free speech.

The conversations after the presentation were also interesting, and seemed to reflect some of the stories about XML that trouble me the most - the notion that we can agree on vocabularies and interop will come automatically with that agreement.

There are a lot of problems with this vision, but perhaps the most dangerous aspect of those problems is that they only tend to emerge as the scale of the work - measured in the number of users and the scope of the vocabulary - increases. In small cases, it's pretty easy to put together some basic stuff quickly and make it work. Configuration files that belong to one programmer and one program are a classic case. Moving from there, files that move within a small circle of people are easy to deal with. Sometimes these experiments bear fruit that works well - HTML, for instance - but they pretty much always encounter difficulties as the scope or audience grow.

The answer in the SGML community, and more generally in the XML and computing communities, has been to form committees of people who know markup issues and the relevant information and hope that their consensus reflects the reality of the information problems and how to solve them.

Committees, especially successful ones, are often reflections of the problems they have to solve. Who's invited? How big is the committee? Which view of a given subject is the right one? Even a single transaction can look very different from different perspectives. What kind of data is involved, and how is it communicated? How much can a single group of people accomplish when their information world is in constant flux?

Some committees accomplish a lot, others accomplish very little. Some stay in touch with a wider audience, and others put up barriers - sometimes to avoid information overload, sometimes to avoid criticism. Versioning is a constant problem for specifications, as the world moves on, and XML's intrinsic promise of 'extensibility' is infuriating for people who want to control extensions.

Schemas don't help this much, except to formalize solutions and give computers a chance of comprehending them. The formalisms provide a vocabulary in which developers and committees can express their intentions, but there's nothing intrinsic about a schema - whether it be a DTD, XML Schema, RELAX NG, Schematron, or even RDF - that pins down meaning in any immutable or unquestionable sense.

Some people seem intent on reaching for the semantic sky, pinning down vocabularies with labels like butterflies. Some of their results are quite beautiful, at least to fellow butterfly collectors, but live butterflies tend to flutter around a lot.

XML isn't going to solve the problems of people who want to pin down the meanings of information stored in computers. It's demonstrated that a consistent syntax for labeling and structuring information is useful for some people and some tasks, but that's about the limit. For some of us, that's too much already. For others of us (myself included), that's enough - going much beyond that seems to cost more than it's worth.

Cook up your own stuff, and get used to consuming what other people offer you, even if it isn't exactly in the form you wanted or expected. People have long been better at this kind of work than computers, but it's time to start accepting chaos rather than trying constantly to control it. That might even mean strengthening the role of humans in information processing again - strange to some, useful to others.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!