SGML for the Web
[I wasn't around for the creation of XML 1.0 - in fact, I've
never been on the inside of W3C work, and prefer to keep it
that way for a lot of reasons I'll explain here someday.
Some explanation of where XML came from seems like a good
idea, though. Let me know if you have corrections.]
I first encountered XML in 1997, a few
drafts into the process. That first draft I encountered
described what became XML as "Part I. Syntax" - the other
two parts were Style (eventually XSL, XSLT, XSL-FO, and
XPath) and Linking (eventually XLink and XPointer). XML
would be much like a simplified SGML,
XSL much like a simplified DSSSL,
and XLink much like a simplified HyTime.
Together these three specifications were going to:
"enable generic SGML to be served,
received, and processed on the Web in the way that is now
possible with HTML. XML has been designed for ease of
implementation and for interoperability with both SGML and
HTML."
That description survives even today in the latest edition of
XML 1.0. In a lot of ways, XML has been a raging success,
but it hasn't exactly lived up to that dream. For the
moment, I'm just going to explore what that scenario might
have looked like.
Publishing XML documents to the Web is pretty simple,
basically like publishing HTML. You upload documents to a
server or you generate them at user request, and a browser,
crawler, agent, or some other application requests the
information.
That part's easy. The tough part comes from XML being both
infinitely more flexible than HTML and kind of dumber. HTML
browsers know what the HTML vocabulary is: how to identify a
headline, a link, an image, a paragraph. A browser working
with XML has no way of knowing that information. That
leaves developers two choices. They can either convert all
their XML to HTML (and to other formats, so it's not a total
waste of time) or they can find other ways to express how to
style and link their documents.
On the styling side, developers have a couple of
loosely-related choices. They can use the Extensible
Stylesheet Language (XSL)'s XSL Transformations
(XSLT) to generate XSL
Formatting Objects (XSL-FO) which then get rendered to
PDF, PostScript, TeX, RTF, or some other printable format.
They can also use XSLT to transform their XML into HTML.
Less intrusive but sometimes less powerful is Cascading
Style Sheets (CSS), a technology originally developed for
HTML that happens to work very well for XML. Instead of
transforming the document, CSS just lets you describe how to
present it.
On the linking side, XML Linking Language
(XLink) provides some powerful - well, compared to HTML
anyway - hypertext linking capabilities. XLink lets you
specify links "out-of-line", even outside of the documents
they link, as well as in the document HTML-style. XLink's
companion standard, XPointer, gives
hypertext developers a set of tools for things like
establishing links between ranges of text in a document.
XLink's been described by its creators as "hypertext ready
for the challenges of the 1970's", but it's an interesting
set of tools. It seemed to ease my frustrations with HTML
without lurching into the complexity of HyTime.
XSL and XLink both took much longer than expected in getting
done, though pieces of XSL (XSLT and XPath) got out sooner.
XSL reached Recommendation status last October, and XLink
last June. XPointer is still in Candidate Recommendation,
with seemingly dim prospects. I'd built a Java SAX
filter for working with XLink, but pretty much abandoned
it as the slow spec continued to change confusingly on its
way through the process. There are also some interactions
between XLink and styling to sort out.
So how's the combination of Extensible Markup Language
(XML), Extensible Stylesheet Language (XSL), and XML Linking
Language (XLL, now XLink) done on the Web? Uh, yeah. I did
a series of articles for XML.com on XML in browsers (Mozilla
Opera
Internet
Explorer Summary
Table) back in 2000, and things haven't
changed that much.
XML does have some
promise, but in rather different directions than expected.