21 Jun 2002 raph   » (Master)

Complexity

Thanks to David McCusker for carte blanche to beat him up. As he no doubt knows, it's not exactly my style, though.

Of course it's not useful to decide whether something is "necessary" on the basis of computational equivalence. S and K are necessary, the rest is a matter of notational convenience. (I is for weenies).

So let me try to state the question in more precise terms:

Is it possible to use a much simpler technique to accomplish nearly the same results, nearly as well?

Applied to the Mithril and SEDA/Java runtimes, the answer depends on how you count. The Java runtime is a horrifically complex beast. If you included it, then I would agree that Mithril is much simpler. But it's also more or less a solved problem (less, admittedly, if you restrict yourself to free software). If you count only the SEDA part, it's quite a bit simpler, because you have the GC daddy taking care of cleaning up all the memory.

I'm not arguing one way or another as to whether you should include the Java runtime. I think it's a good question.

The food chain

Now for some more general musings. The above emphasizes that software systems today are not isolated artifacts, but exist as part of a larger network. Dan Sugalski uses the term "food chain" to refer to this network in a recent interview. Coincidentally, Barabasi in his book "Linked" finds that food chain networks and the Web have similar scale-free topology.

Like the food chain, the software project network has one-way links. Mozilla depends on C++, but C++ does not depend on Mozilla. (Of course, as in real food chains, all kinds of complex interactions can occur once you start looking at second-order effects). I think it's worth looking at the properties of these links in more detail. What follows is a rough cut. I'm sure I didn't think of everything.

In an "A depends on B" relationship, B is always more focussed on performance, robustness, and stability than A. If B is lacking in any of these, then the combined A+B system is also. (Fault tolerance would seem to be the exception that proves the rule.) At the lower levels of the food chain, you see an almost fanatical attention to these issues. CPU's come to mind, especially. The price of a 2.53GHz Pentium 4 is currently more than double that of a 2.2GHz one, for a 15% speed improvement at best. Yet, for many applications, performance is measured in terms of order of magnitude.

Compilers, runtimes, and OS kernels all similarly have an intense focus on performance and stability. Of course, the need for these things varies. Windows 98 is woefully inadequate to run, say, an Oracle database, yet is considered a perfectly fine platform for gaming.

The higher levels of the food chain start to be concerned with user needs, which are famously complex. If you had to meet these needs and optimize performance to the bone, the problem becomes intractable. That's why the food chain exists. Filesystems are a lot closer to user needs than raw disk. Thus, applications rely on the filesystem to map performance and robustness reasonably well, and save a huge amount of complexity. Once this relationship is established, then quantitative work at the lower level (improving caching algorithms, say) has a positive effect on the entire system. Note, however, the downward pressure when the next level up is a database.

Just as the food chain is critical for the health of an ecosystem, I believe the software project network is critical for the health of a software community. We are blessed with many good things at the lower levels: the Linux kernel, gcc, etc. They're not perfect, of course, but they don't prevent you from doing good work at the higher levels either.

A good example of dysfunction, I think, is X during the late '80s and early-to-mid '90s. Here, the lowest level (the X protocol and Xlib) was reasonably good, but the next levels up (Xt and Motif) were awful. As a result, X failed to get attention at the lower levels too, and lots of things that should have happened didn't (better font handling, more advanced imaging model). Now, there are numerous healthy projects situated at the next level up from X (Gtk+ and Qt are the most popular), and we see X getting good quantitative improvement now too.

Now I'll make some people mad. In language communities, these relationships are especially important. Everything written in a language depends critically on the performance and stability of that languages's implementation. Thus, there is an important role for a project to do hardcore quantitative performance work, combined with a process that ensures stability. If that role isn't strongly filled, I think the language community is in trouble. And here is what I see: in Python, this role isn't really being filled, but in Perl 6 it looks like it will be, by Parrot, the subject of the Sugalski interview above.

That said, the dependence on the language implementation for performance may not be as critical in the Python community as it is in others, because the CPython+C amalgam is a reasonable approach in many cases. But if Perl 6 and Parrot succeed, it will be appealing to write high performance systems entirely in the high level language. And there are some things happening in Python performance land as well.

Food for thought, in any case.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!