This is the fifth in a series of (mostly) weekly essays. This week, Advogato subjects us to a brief tutorial on standards.
This is the fifth in a series of (mostly) weekly essays. This week, Advogato subjects us to a brief tutorial on standards.
Standards, more than anything else, make up the texture of our computing environment. They tend to be longer lived than just about any other computer-related artifact. Yet, many standards are of relatively poor quality, while others are unfriendly to free software. Good standards, on the other hand, make life livable, or at least make stuff work. This essay attempts to analyze what makes good and bad standards, from the viewpoint of free software developers. But then, when's the last time Advogato analyzed anything from a point of view other than free software developers?
In the early days of solid-state electronics, transistors were expensive, and wires were effectively free. Now, in the days of 0.18 micron chips, transistors are virtually free (they just happen when two different doping layers cross), while the wires are expensive - not only do they take up most of a chip's real estate, but because of capacitance and other effects, signal propagation delay is actually superlinear in the wire's length.
Similarly, when Wirth published "Algorithms + Data Structures = Programs" in 1976, it was a reasonable formula. But these days, many large systems are built with hardly any interesting algorithms or data structures at all. Almost all of the remaining "dark matter" can be counted as integration work. While algorithms haven't become insignificant yet, integration is increasingly where the hard problems, and the work, of system-building happens.
In my view, a standard is a document, or perhaps uncodified set of practices, with the purpose of reducing the cost of integration. A good standard reduces the integration cost significantly. A bad standard doesn't reduce the cost, or perhaps doesn't even facilitate making the integration happen. A standard can cover a protocol, an API, a programming language, a file format, object request brokering, hardware, or any mix of these.
Free software authors do particularly well with standards - a good standard gives people a clear target to shoot for. (Note, as an aside, that this "clear target" theory was well elucidated by Vinod Valloppillil in his infamous Halloween Documents, but he used the more pejorative analogy "chasing tail lights". There are plenty of examples of open source projects that are more advanced than proprietary programs at implementing new standards). Conversely, lack of standards hurts free software disproportionally. In my opinion, the lack of a really usable open source word processor has more to do with the lack of a standard word processing format than anything else.
Advogato is eager to get to the meat of this essay, which is a set of criteria for evaluating a standard. Each is presented as a potential pitfall, with examples of standards that manage to avoid the pitfall, or other bad standards that fall directly in.
One of the most common flaws in a standard is to leave enough loose ends unspecified that implementors have plenty of leeway to screw up integration.
I'm going to use JPEG as a running example here, because it illustrates a number of issues, and because of the significant role that free software played in the ultimate success of the standard. I'll just quote from the JPEG FAQ:
Strictly speaking, JPEG refers only to a family of compression algorithms; it does not refer to a specific image file format. The JPEG committee was prevented from defining a file format by turf wars within the international standards organizations.
Since we can't actually exchange images with anyone else unless we agree on a common file format, this leaves us with a problem. In the absence of official standards, a number of JPEG program writers have just gone off to "do their own thing", and as a result their programs aren't compatible with anyone else's.
JFIF has emerged as the de-facto standard on Internet, and is what is most commonly meant by "a JPEG file". Most JFIF readers are also capable of handling some not-quite-JFIF-legal variant formats.
Indeed, in the early days, there were a number of proprietary, incompatible file formats all conforming to the JPEG standard. Thus, the JFIF work (coordinated by C-Cube) rescued JPEG from being badly underspecified, and in fact is one of the better image formats available today.
The dual problem to underspecification is overspecification, i.e. overly constraining implementation details. A very simple example of overspecification is in the termios API for controlling serial ports (actually part of POSIX.1). This API has a fixed set of serial speeds, ranging from 50 to 38400. Since serial ports have gotten a lot faster, implementations have been forced to resort to nonstandard kludges to access these higher speeds.
A truly remarkable example of avoiding overspecification is the TCP/IP protocol suite. These standards were written almost twenty years ago, and are still completely valid today, in spite of the rather stunning advances in networking hardware since then. These protocols run over a very diverse range of hardware, including modems, Ethernet, T1 lines, ATM, a number of fiber standards, and, of course, carrier pigeons. These core Internet standards (many of which were written or edited by the late Jon Postel) are in many ways what standards should aspire to be.
A very common problem of standards committees is limited scope of the resulting standards. In fairness to the committee, getting a standard through the process is usually very difficult work. Expanding the scope may well make it impossible. Nonetheless, it is a real problem.
I'll pick on POSIX here, for not standardizing on Internet API's, even though POSIX contains most everything else you need for a Unix environment. Instead, the Berkeley sockets API has made itself the de facto standard, even surviving the transition to the Win32 platform.
Microsoft showed how useless an unfriendly implementation of a standard can be with their POSIX compatibility module for NT. Microsoft was motivated by various Government requirements that computer systems they purchase be POSIX compatible. The actual implementation was so minimal that I don't know of anyone who actually used it.
Unix systems these days all provide both POSIX compatibility for the non-network stuff, and Berkeley sockets for the networking. However, the situation is still far from perfect. A particularly irritating example is the lack of a standard for nonblocking or thread-safe gethostbyname() functions, even though there is a POSIX standard for threads. Almost all Unices provide such a function, but it's not standard, so it's very difficult to write portable programs that do nonblocking host lookups.
A very serious problem with many standards is poor implementation. Standards bodies generally think of their work as ending when the spec is published, but the success or failure of a standard is often critically dependent on good implementation. In fact, it's quite possible for good implementation to create a standard even when no formal document exists (this is a de facto standard).
A good example of a poorly implemented standard is the W3C's Cascading Style Sheets activity. Both Microsoft and Netscape rushed to get CSS implementations in their browsers, but they sucked. Things are getting better, but it will be a while before authors can rely on the CSS standard to adequately control the presentation of their web pages.
In my opinion, a great deal of the responsibility for poorly implemented standards does rest on the standards committee and their final product (the standards document). Probably the biggest culprit is complexity. A simple standard is usually possible to implement well.
The spina bifida of standards is to provide lots of optional features and extension mechanisms. The problem is that if two implementations support different sets of options, there may be no interoperability at all. Some standards (such as PNG) are careful to preserve interoperability even when extended, while others don't.
A good example of the latter is TIFF. Many aspects of the image coding are left optional, including the compression format. Thus, it's quite common for one system to write a TIFF file, and another to not be able to read it, even though they both support the TIFF standard.
Incidentally, JPEG also suffers from too many options, many of which are not widely implemented. However, the Independent JPEG Group has, mostly on the strength of their implementation quality, created a de facto standard that in practice ensures that JPEG files are some of the most reliably interchangeable around.
Open-endedness is similar to underspecification, but is usually motivated differently, usually by a desire to make the spec "extensible." Peter Gutmann's writeup on X.509 is an excellent cautionary tale about the dangers of unfettered extensibility.
A very closely related issue is conformance. Sometimes there are valid reasons for standards to contain optional features (ie, as a way of avoiding overspecification). Proper attention given to conformance often helps reduce the problems. A good example of conformance done right (in my opinion) is ANSI C.
It's not really fair to pick on an unfinished spec, but the latest working draft of the SVG spec has conformance problems too juicy for Advogato to pass up. Examples: the chapter on styling describes how to use XSLT, but the conformance section doesn't require implementations to support it, nor does it explicitly forbid implementations from generating it. Similarly, SVG provides its own font format because the committee was unable to agree on an existing font standard, yet the conformance guidelines do not explicitly require SVG fonts to be embedded. These will no doubt be fixed, but are illuminating examples nonetheless.
One way to make a standard ineffective in reducing integration costs is to constantly change it. A case in point is Java, which is evolving rapidly enough that free implementations generally aren't able to keep up. Another case in point is the Microsoft Word file format. Both of these have obviously hurt free software implementations.
The Win32 API is a classic example of a poorly documented and ambiguous API. As such, it has caused the Wine project to take considerably longer than most people hoped.
There are a few issues that don't necessarily imply bad standards, but are nonetheless not good signs.
Including lots of other standards by reference means that the overall standard inherits all the flaws of its children. This is usually a problem for unfinished standards. For example, the fact that TCP references IP should not be considered cause for alarm.
The other problem with inclusion by standard is that it is very difficult to resist the upgrade treadmill.
Standards bodies can easily become addicted to the upgrade treadmill, specifically the idea that any flaws in the present standard can get fixed in the next rev. There's nothing wrong with using implementation experience with an early version of the standard to refine later versions, but this should not be used as a crutch.
The other common problem with constantly-incrementing version numbers is that they usually come with a huge increase in complexity. This is basically the same problem as the "moving target".
Any bad standard is an issue for free software, but some characteristics are especially painful.
Probably the most important of these is a standard that requires the use of encumbered intellectual property. Again, JPEG is a sterling example. The ISO JPEG standard specifies arithmetic coding, which gets better compression than the widely used Huffman coding, but is covered by several patents. In this case, the IJG again saved the day, by eliminating arithmetic coding from the de facto standard.
Many other intellectual property problems have a less happy fate. A very well known example is the GIF file format. This also highlights the problem of "intellectual property ambush", ie the owner of the intellectual property silently allowing the standard to become established, then asserting the intellectual property rights afterwards. In Advogato's opinion, free software authors should be particularly vigilant about intellectual property ambush. The case of GIF shows how easily the problems could have been avoided had people been worried about the patented algorithms. There are numerous examples today in which there is a (probably valid) patent which is not currently being asserted, perhaps most notably TrueType.
Basically, what keeps proprietary software companies in business is a proprietary advantage over their competitors. A good standard lowers the barriers to entry for competition, and thus possibly erodes the bottom line.
Thus, most of the "bad standard" techniques discussed above have been used competitively to enhance market position. These issues have been discussed in quite a bit of detail in the essay, The decommoditization of protocols.
Good standards are vital for the health of free software. It's much more fun to work with a good standard than a bad one. Thus, free software developers should be aware of the issues that make bad standards. Those of us who feel brave can also lobby for good standards, and actively fight bad ones.
Further, it is probably important to start creating our own standards. I seriously doubt that a good word processor file format will just miraculously happen otherwise. A good example of a standard emerging from the free software community is Vorbis, a next generation patent-free audio codec.
Unfortunately, most standards-making is not much fun. I think it makes sense to get creative about how to make standards without the long, frustrating meetings, pointless bickering, politics, and related problems afflicting most standards bodies. But it probably doesn't have to be that way. Advogato hopes to be able to foster and support this important work.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!