Older blog entries for simonstl (starting at number 42)

Eventually it becomes clear that any effort to discuss URLs or URIs is pointless.

Uniform Resource Identifiers are the strangest religion I've encountered on the Web. People use them in all kinds of largely incompatible ways, but somehow we're supposed to believe that since the URL part works and survives things like cache issues, these magical abstractions will solve all our identification problems.

Meanwhile, no one can give me a straight answer on how to identify a representation with a URI reference. The short answer, of course, is that you can't, though the fragment identifier part is amusingly representation-dependent and it seems like a representation must in some sense be a resource itself... but you'd better stay away from that hack of a DOS-like file extension or fall into sin.

The Web as a huge set of Platonic Forms would be hysterically funny if it weren't so throughly sad.

I came to XML for hypertext, and I'm still trying to get there. XLink/XPointer is just not that helpful, but its supporters seem pretty well convinced. Tim Bray, for instance, said:

I reviewed the XLink spec, and I thought about how I'd go about designing markup for multi-ended and out-of-band links, and I thought XLink presented a pretty compelling design for how you'd do those things.

I think disagreement should be accompanied by examples: "here's a better way to do a multi-ended/out-of-band/metadata-loaded hyperlink, and here's why it's better."

Here's phase 1 of that discussion - an initial proposal. The next phases include a more comprehensive example, a formal processing model, and an implementation.

And if that proposal seems like a lot of work to you for linking, don't worry! Micah Dubinko's been working on a much simpler set of linking constructs for in-line linking called SkunkLink. Given the chance, I think SkunkLink will take care of most common linking issues and let those of us interested in stranger stuff focus on difficult questions more cleanly.

After three years of slow change, I've finally gotten around to updating my Outsider's Guide to the W3C. Dealing with the W3C isn't always fun, but knowing how the process works seems important even for those of us unwilling to take vows of silence.

Ah, the pleasant delights of content negotiation.

Taken seriously, content-negotiation is powerful stuff, but most W3C specs seem to either ignore it or go out of their way to avoid it. At first I thought that meant there was something wrong with the W3C specs, but now I just think it means there's a powerful integration issues that no one's been willing to sort out - and maybe jettisoning conneg would be a lot easier for most of the parties.

At the same time, though, putting multiple types of content behind a single URI has gotten much much easier lately, partly thanks to XML and XSLT and partly thanks to the strong foundation provided by Apache itself and frameworks like Cocoon.

Time to turn off my brain for a bit and enjoy the new year as it arrives.

Another good quote from Gustav Stickley that I think works as well for Open Source as for "Home Training in Cabinetwork". It's good to see that stuff from 1905 still has relevance nearly a century later.

The instinct of doing things is a common one, and can be made a source of pleasure, healthy discipline, and usefulness, even when the work is taken up as recreation...

When one has made with his own hands any object of use or ornament there is a sense of personal pride and satisfaction in the result, that no expenditure of money can buy, and this very fact serves to dignify the task and stamp it with individuality. [1]

If you'd rather look forward than backward, you might explore my ORN blog on XML for 2003.

[1] Bavaro, Joseph and Mossman, Thomas. The Furniture of Gustav Stickley (Linden Publishing: 1996), p.89, citing March 1905 issue of The Craftsman.

When I'm not tinkering with computers, I'm frequently working in my basement shop. My tools-to-talent ratio is still badly skewed to tools, but I figure I have 30 more years to make it all work the way I want. In the meantime, I'm making jigsaw puzzles, turning pens, and building some basic cabinets.

My aspiration for next year is a remodeling project, focused on the livingroom of my 1929 house. We're hoping to preserve its style while replacing the leaky windows, thin sheetrock, and badly finished trim. Oh, and insulation too. It's going to keep the basic Craftsman style that's already there and throughout the house, and I'm going to be building a fair amount of furniture to go with it.

While researching early Stickley furniture (which I'm planning to make, not buy - Stickley is now pretty pricey), I found this nice bit of information about Stickley's perspective in the early 20th century:

[Stickley] emphasized the motive behind the act, the need for the object, as determining the joy of its construction. "It is what we do ourselves, of our own impelling, that is of value to us. Never do a thing unless something definite justifies it... Let your design grow out of necessity.

"It is written, 'In the sweat of thy brow...' but it was never written, 'In the breaking of thine heart shalt thou eat bread.'" Stickley used this quote from Ruskin to illustrate his own feeling that joy in labor is a necessity without which the product of one's labor is an empty reward. Without it, the worker passes the hours trading physical effort for monetary gain, looking to the future for a time when prosperity will release him from this bondage.

Stickley felt trading labor for material gain led to products of an impermanent nature, both physically and aesthetically.[1]

I know Stickley was writing about furniture making, but it sure has echoes of open source to me. Craftsmanship seems to hold as important a role in software development as in furniture making, and the shared nature of software may make it even more important.

[1] Bavaro, Joseph and Mossman, Thomas. The Furniture of Gustav Stickley (Linden Publishing: 1996), p.35

19 Dec 2002 (updated 19 Dec 2002 at 17:50 UTC) »
Content negotiations

If there's a slowly ticking time bomb in the architecture of the Web, I suspect the best candidate is URI references.

Uniform Resource Identifiers are plagued with a circular philosophy constrained only lightly with a standard syntax, but at least plain old URIs are really only identifiers, with no strong bonds between the structure of URIs and the resources they identify. This loose connection (sometimes valued as opacity) and the circularity have made it possible for URI supporters to brush away all kinds of objections to their scheme-based scheme for years, and URIs themselves seem safely inert.

URIs have given developers a lot of freedom to create flexible systems. One of the coolest features this permits is content negotiation. Because the resource is separated from any single representation, it's possible, for instance, to visit "http://example.com/" and get back a result in any format under the sun, depending on what your browser is configured to ask for.

Typically, people expect HTML, but it could also plausibly return SVG, SMIL, Flash, RDF, a JPEG, or whatever. MIME Media types are the largest part of this negotiation, but there are also possibilities for negotiation language, character set, and anything else you can describe easily in a header. This isn't arcane functionality supported only by a privileged few - it's built into pretty much every Web server out there now, and it's not that hard to configure. (The browser side is messier, but I can call that an interface problem.)

So where does my supposed "ticking time bomb" appear? It's not in the URI itself, but rather in a key set of features that extend URIs, most particularly in the fragment identifier portion of URI references. Section 4.1 of RFC 2396 states:

When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed....

The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. The character restrictions described in Section 2 for URI also apply to the fragment in a URI-reference. Individual media types may define additional restrictions or structure within the fragment for specifying different types of "partial views" that can be identified within that media type.

A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.

Unpacking all that produces a fairly simple processing model. Fragment identifiers are separate from the information sent to the server during retrieval. That retrieval uses the URI portion of the URI reference, and gets back a representation of the resource. Once it has this representation, it applies the fragment identifier to the representation and the application (hopefully) does something interesting with the fragment or fragments returned.

The problem with that process is the gap between the retrieval process and fragment identifier processing. The retrieval process is (very cool) subject to content negotiation, but there's no mechanism for fragment identifiers to communicate their expectations for that negotiation. As RFC 2396 makes explicit, "the format and interpretation of fragment identifiers is dependent on the media type of the retrieval result". If the media type returned from the URI is different from the media type the creator of the URI reference expected, then fragment identifier processing will quite likely fail.

Roy Fielding got me thinking about this with a post that pretty much blasted XPointer. Given that Fielding is one of the authors of RFC 2396, and is in fact starting up a revision process, his opinion clearly matters - but the processing model described above seems to have driven him into a fit of conservatism about the nature of fragment identifiers. In a follow-up message he writes:

However, URI and fragment identifiers are not media type specific, and in fact do not allow media type concerns to be interleaved with identification.

ID is a reasonable solution, but one that existed prior to XPointer. Other ways of identifying content independent of media type include search terms, paragraph text, and regular expressions.

I worry that this first paragraph is largely wrong given RFC 2396 and existing MIME Media Type registration practices (which explicitly define fragment identifiers for given media types), and because of that, fragment identifiers quite certainly do mix media type concerns with identification. (URIs quite clearly do not mix media types and identification.)

The second paragraph is perhaps the most important, however, as it suggests at least one route out of this problem. It's pretty much what we've done for HTML, for instance. Unfortunately, IDs are a fairly messy issue in XML (see this algorithm for figuring out what's an ID). While text and regular expressions are both great ideas (I'm working on Internet-Drafts for XPointer schemes which do those), they also don't work so well with things like SVG, where identifying a particular view of a drawing may be more important. Sorting out conservatism which produces consistent results with a more liberal approach that takes greater advantage of the diversity of media types will take a long while.

(I did post an Internet-Draft that supports different media types through fallthrough, but it's far from clear that it's a useful approach.)

There's a lot of work yet to be done here. My current conclusion is that URI references should be considered abbreviations, and that developers who want more control than the current framework for processing URI references can provide should start thinking hard about these problems. The XPointer Framework, which will probably be the flashpoint for discussions in this area to come, has a lot of useful ideas in it. We need to sort out whether URI reference fragment identifiers are the right home for all of those ideas, and how best to integrate those ideas with the Web.

Open Source, Open Data

Eric van der Vlist's posted a piece on Microsoft's Office 11 announcements at XML 2002. I see that it's been noted on an Advogato diary already, so maybe I won't get shot at for suggesting people take a look, but this is some very interesting work, well-worth examining whatever your feelings about its creators. Eric does a nice job of contemplating the consequences for better and worse.

I gave a presentation at the Open Source Conference two years ago called "Open Source, Open Data: What XML has to offer Open Source". At the time, .NET (and the now-quiet "Hailstorm") was the main Microsoft data opening, but I think a lot of it applies as much or more to the Office story. There should be a lot more "free love" out there in the near future, though that's certainly different from "free speech" in the Open Source sense.

Markup and code

I've had some strange discussions lately, both at XML 2002 and outside, where people seem to think that programming code matters and markup is just an accident, something that really doesn't matter much. Progams are the ones creating and interpreting all that markup, right? So why don't the markup people shut up, roll over, and let the programmers do the real work?

It gets pretty depressing sometimes. There are certainly people out there who grasp the difference between "data" and "code" and understand why the constraints on the two and their respective practices are very different, right?

Getting the details right in your markup should mean that writing the code to process it will be easy, whatever the environment. Instead, a lot of people seem to look at how they write code and assume that what's easy for them is easy for everyone else, so they let their code assumptions flow into their markup design - making their markup easy for them but not necessarily for anyone else.

Fortunately, there are other reasons out there to be optimistic about the future of markup.

33 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!