If there's a slowly ticking time bomb in the architecture of the Web, I suspect the best candidate is URI references.
Uniform Resource Identifiers are plagued with a circular philosophy constrained only lightly with a standard syntax, but at least plain old URIs are really only identifiers, with no strong bonds between the structure of URIs and the resources they identify. This loose connection (sometimes valued as opacity) and the circularity have made it possible for URI supporters to brush away all kinds of objections to their scheme-based scheme for years, and URIs themselves seem safely inert.
URIs have given developers a lot of freedom to create flexible systems. One of the coolest features this permits is content negotiation. Because the resource is separated from any single representation, it's possible, for instance, to visit "http://example.com/" and get back a result in any format under the sun, depending on what your browser is configured to ask for.
Typically, people expect HTML, but it could also plausibly return SVG, SMIL, Flash, RDF, a JPEG, or whatever. MIME Media types are the largest part of this negotiation, but there are also possibilities for negotiation language, character set, and anything else you can describe easily in a header. This isn't arcane functionality supported only by a privileged few - it's built into pretty much every Web server out there now, and it's not that hard to configure. (The browser side is messier, but I can call that an interface problem.)
So where does my supposed "ticking time bomb" appear? It's not in the URI itself, but rather in a key set of features that extend URIs, most particularly in the fragment identifier portion of URI references. Section 4.1 of RFC 2396 states:
When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed....
The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. The character restrictions described in Section 2 for URI also apply to the fragment in a URI-reference. Individual media types may define additional restrictions or structure within the fragment for specifying different types of "partial views" that can be identified within that media type.
A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.
Unpacking all that produces a fairly simple processing model. Fragment identifiers are separate from the information sent to the server during retrieval. That retrieval uses the URI portion of the URI reference, and gets back a representation of the resource. Once it has this representation, it applies the fragment identifier to the representation and the application (hopefully) does something interesting with the fragment or fragments returned.
The problem with that process is the gap between the retrieval process and fragment identifier processing. The retrieval process is (very cool) subject to content negotiation, but there's no mechanism for fragment identifiers to communicate their expectations for that negotiation. As RFC 2396 makes explicit, "the format and interpretation of fragment identifiers is dependent on the media type of the retrieval result". If the media type returned from the URI is different from the media type the creator of the URI reference expected, then fragment identifier processing will quite likely fail.
Roy Fielding got me thinking about this with a post that pretty much blasted XPointer. Given that Fielding is one of the authors of RFC 2396, and is in fact starting up a revision process, his opinion clearly matters - but the processing model described above seems to have driven him into a fit of conservatism about the nature of fragment identifiers. In a follow-up message he writes:
However, URI and fragment identifiers are not media type specific, and in fact do not allow media type concerns to be interleaved with identification.
ID is a reasonable solution, but one that existed prior to XPointer.
Other ways of identifying content independent of media type include
search terms, paragraph text, and regular expressions.
I worry that this first paragraph is largely wrong given RFC 2396 and existing MIME Media Type registration practices (which explicitly define fragment identifiers for given media types), and because of that, fragment identifiers quite certainly do mix media type concerns with identification. (URIs quite clearly do not mix media types and identification.)
The second paragraph is perhaps the most important, however, as it suggests at least one route out of this problem. It's pretty much what we've done for HTML, for instance. Unfortunately, IDs are a fairly messy issue in XML (see this algorithm for figuring out what's an ID). While text and regular expressions are both great ideas (I'm working on Internet-Drafts for XPointer schemes which do those), they also don't work so well with things like SVG, where identifying a particular view of a drawing may be more important. Sorting out conservatism which produces consistent results with a more liberal approach that takes greater advantage of the diversity of media types will take a long while.
(I did post an Internet-Draft that supports different media types through fallthrough, but it's far from clear that it's a useful approach.)
There's a lot of work yet to be done here. My current conclusion is that URI references should be considered abbreviations, and that developers who want more control than the current framework for processing URI references can provide should start thinking hard about these problems. The XPointer Framework, which will probably be the flashpoint for discussions in this area to come, has a lot of useful ideas in it. We need to sort out whether URI reference fragment identifiers are the right home for all of those ideas, and how best to integrate those ideas with the Web.