Older blog entries for simonstl (starting at number 44)

Is civility harmful?

Mark Baker has an interesting response to a piece by Elliotte Rusty Harold on Web Services. Mark, despite calling himself a "Tech Curmudgeon", takes issue with Elliotte's use of the word "idiots", and asks "So please folks, try to keep it civil. Comments such as this one only serve to alienate, which is the last thing we need."

I'm not a particularly friendly or polite person, and didn't see Elliotte's comments as anywhere close to out of line. Still, I think I agree with Mark that "Comments such as this one only serve to alienate". Where I part company is that I think alienation is important, and that pretending we like each other more than we really do is likely to produce muddy compromises at best.

In the case of Web Services and XML, it's become clearer and clearer over time that these technologies are barely related and frequently in conflict. Web Services happens to use XML, but they use it quite badly, from the perspective of many XML people. As Elliotte says "Web Services violate the fundamental design of XML". Not only that, but the ambitions of Web Services have been a driving force behind some rather toxic specifications, notably W3C XML Schema. I don't mind saying that Web Services is poisoning XML, turning what was once a simplification into a major new set of complications.

Does that alienate people? Yes! It should. I'd love to get the Web Services folks to rethink their foundations. Failing that, making clear that there are serious points of friction seems like the best course of action - and being civil has little to contribute to that. Forking is not a risk here - it's an opportunity.

Of course, I also found this rant well-worth reading, though it's quite completely over the top. Expecting progress to come in neatly-wrapped boxes with thank-you notes attached seems like a lot too much to ask - and counter-productive, to boot.

Every now and then, something new and interesting surfaces in the world of URIs. The notion of a "probabilistic web" is both different and well-worth considering. Maybe there is something to all those lines about light appearing in the darkest hour.

Eventually it becomes clear that any effort to discuss URLs or URIs is pointless.

Uniform Resource Identifiers are the strangest religion I've encountered on the Web. People use them in all kinds of largely incompatible ways, but somehow we're supposed to believe that since the URL part works and survives things like cache issues, these magical abstractions will solve all our identification problems.

Meanwhile, no one can give me a straight answer on how to identify a representation with a URI reference. The short answer, of course, is that you can't, though the fragment identifier part is amusingly representation-dependent and it seems like a representation must in some sense be a resource itself... but you'd better stay away from that hack of a DOS-like file extension or fall into sin.

The Web as a huge set of Platonic Forms would be hysterically funny if it weren't so throughly sad.

I came to XML for hypertext, and I'm still trying to get there. XLink/XPointer is just not that helpful, but its supporters seem pretty well convinced. Tim Bray, for instance, said:

I reviewed the XLink spec, and I thought about how I'd go about designing markup for multi-ended and out-of-band links, and I thought XLink presented a pretty compelling design for how you'd do those things.

I think disagreement should be accompanied by examples: "here's a better way to do a multi-ended/out-of-band/metadata-loaded hyperlink, and here's why it's better."

Here's phase 1 of that discussion - an initial proposal. The next phases include a more comprehensive example, a formal processing model, and an implementation.

And if that proposal seems like a lot of work to you for linking, don't worry! Micah Dubinko's been working on a much simpler set of linking constructs for in-line linking called SkunkLink. Given the chance, I think SkunkLink will take care of most common linking issues and let those of us interested in stranger stuff focus on difficult questions more cleanly.

After three years of slow change, I've finally gotten around to updating my Outsider's Guide to the W3C. Dealing with the W3C isn't always fun, but knowing how the process works seems important even for those of us unwilling to take vows of silence.

Ah, the pleasant delights of content negotiation.

Taken seriously, content-negotiation is powerful stuff, but most W3C specs seem to either ignore it or go out of their way to avoid it. At first I thought that meant there was something wrong with the W3C specs, but now I just think it means there's a powerful integration issues that no one's been willing to sort out - and maybe jettisoning conneg would be a lot easier for most of the parties.

At the same time, though, putting multiple types of content behind a single URI has gotten much much easier lately, partly thanks to XML and XSLT and partly thanks to the strong foundation provided by Apache itself and frameworks like Cocoon.

Time to turn off my brain for a bit and enjoy the new year as it arrives.

Another good quote from Gustav Stickley that I think works as well for Open Source as for "Home Training in Cabinetwork". It's good to see that stuff from 1905 still has relevance nearly a century later.

The instinct of doing things is a common one, and can be made a source of pleasure, healthy discipline, and usefulness, even when the work is taken up as recreation...

When one has made with his own hands any object of use or ornament there is a sense of personal pride and satisfaction in the result, that no expenditure of money can buy, and this very fact serves to dignify the task and stamp it with individuality. [1]

If you'd rather look forward than backward, you might explore my ORN blog on XML for 2003.

[1] Bavaro, Joseph and Mossman, Thomas. The Furniture of Gustav Stickley (Linden Publishing: 1996), p.89, citing March 1905 issue of The Craftsman.

When I'm not tinkering with computers, I'm frequently working in my basement shop. My tools-to-talent ratio is still badly skewed to tools, but I figure I have 30 more years to make it all work the way I want. In the meantime, I'm making jigsaw puzzles, turning pens, and building some basic cabinets.

My aspiration for next year is a remodeling project, focused on the livingroom of my 1929 house. We're hoping to preserve its style while replacing the leaky windows, thin sheetrock, and badly finished trim. Oh, and insulation too. It's going to keep the basic Craftsman style that's already there and throughout the house, and I'm going to be building a fair amount of furniture to go with it.

While researching early Stickley furniture (which I'm planning to make, not buy - Stickley is now pretty pricey), I found this nice bit of information about Stickley's perspective in the early 20th century:

[Stickley] emphasized the motive behind the act, the need for the object, as determining the joy of its construction. "It is what we do ourselves, of our own impelling, that is of value to us. Never do a thing unless something definite justifies it... Let your design grow out of necessity.

"It is written, 'In the sweat of thy brow...' but it was never written, 'In the breaking of thine heart shalt thou eat bread.'" Stickley used this quote from Ruskin to illustrate his own feeling that joy in labor is a necessity without which the product of one's labor is an empty reward. Without it, the worker passes the hours trading physical effort for monetary gain, looking to the future for a time when prosperity will release him from this bondage.

Stickley felt trading labor for material gain led to products of an impermanent nature, both physically and aesthetically.[1]

I know Stickley was writing about furniture making, but it sure has echoes of open source to me. Craftsmanship seems to hold as important a role in software development as in furniture making, and the shared nature of software may make it even more important.

[1] Bavaro, Joseph and Mossman, Thomas. The Furniture of Gustav Stickley (Linden Publishing: 1996), p.35

19 Dec 2002 (updated 19 Dec 2002 at 17:50 UTC) »
Content negotiations

If there's a slowly ticking time bomb in the architecture of the Web, I suspect the best candidate is URI references.

Uniform Resource Identifiers are plagued with a circular philosophy constrained only lightly with a standard syntax, but at least plain old URIs are really only identifiers, with no strong bonds between the structure of URIs and the resources they identify. This loose connection (sometimes valued as opacity) and the circularity have made it possible for URI supporters to brush away all kinds of objections to their scheme-based scheme for years, and URIs themselves seem safely inert.

URIs have given developers a lot of freedom to create flexible systems. One of the coolest features this permits is content negotiation. Because the resource is separated from any single representation, it's possible, for instance, to visit "http://example.com/" and get back a result in any format under the sun, depending on what your browser is configured to ask for.

Typically, people expect HTML, but it could also plausibly return SVG, SMIL, Flash, RDF, a JPEG, or whatever. MIME Media types are the largest part of this negotiation, but there are also possibilities for negotiation language, character set, and anything else you can describe easily in a header. This isn't arcane functionality supported only by a privileged few - it's built into pretty much every Web server out there now, and it's not that hard to configure. (The browser side is messier, but I can call that an interface problem.)

So where does my supposed "ticking time bomb" appear? It's not in the URI itself, but rather in a key set of features that extend URIs, most particularly in the fragment identifier portion of URI references. Section 4.1 of RFC 2396 states:

When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed....

The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. The character restrictions described in Section 2 for URI also apply to the fragment in a URI-reference. Individual media types may define additional restrictions or structure within the fragment for specifying different types of "partial views" that can be identified within that media type.

A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.

Unpacking all that produces a fairly simple processing model. Fragment identifiers are separate from the information sent to the server during retrieval. That retrieval uses the URI portion of the URI reference, and gets back a representation of the resource. Once it has this representation, it applies the fragment identifier to the representation and the application (hopefully) does something interesting with the fragment or fragments returned.

The problem with that process is the gap between the retrieval process and fragment identifier processing. The retrieval process is (very cool) subject to content negotiation, but there's no mechanism for fragment identifiers to communicate their expectations for that negotiation. As RFC 2396 makes explicit, "the format and interpretation of fragment identifiers is dependent on the media type of the retrieval result". If the media type returned from the URI is different from the media type the creator of the URI reference expected, then fragment identifier processing will quite likely fail.

Roy Fielding got me thinking about this with a post that pretty much blasted XPointer. Given that Fielding is one of the authors of RFC 2396, and is in fact starting up a revision process, his opinion clearly matters - but the processing model described above seems to have driven him into a fit of conservatism about the nature of fragment identifiers. In a follow-up message he writes:

However, URI and fragment identifiers are not media type specific, and in fact do not allow media type concerns to be interleaved with identification.

ID is a reasonable solution, but one that existed prior to XPointer. Other ways of identifying content independent of media type include search terms, paragraph text, and regular expressions.

I worry that this first paragraph is largely wrong given RFC 2396 and existing MIME Media Type registration practices (which explicitly define fragment identifiers for given media types), and because of that, fragment identifiers quite certainly do mix media type concerns with identification. (URIs quite clearly do not mix media types and identification.)

The second paragraph is perhaps the most important, however, as it suggests at least one route out of this problem. It's pretty much what we've done for HTML, for instance. Unfortunately, IDs are a fairly messy issue in XML (see this algorithm for figuring out what's an ID). While text and regular expressions are both great ideas (I'm working on Internet-Drafts for XPointer schemes which do those), they also don't work so well with things like SVG, where identifying a particular view of a drawing may be more important. Sorting out conservatism which produces consistent results with a more liberal approach that takes greater advantage of the diversity of media types will take a long while.

(I did post an Internet-Draft that supports different media types through fallthrough, but it's far from clear that it's a useful approach.)

There's a lot of work yet to be done here. My current conclusion is that URI references should be considered abbreviations, and that developers who want more control than the current framework for processing URI references can provide should start thinking hard about these problems. The XPointer Framework, which will probably be the flashpoint for discussions in this area to come, has a lot of useful ideas in it. We need to sort out whether URI reference fragment identifiers are the right home for all of those ideas, and how best to integrate those ideas with the Web.

35 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!