9 Sep 2002 raph   » (Master)

Ghostscript

A 7.30 release is pre-announced. This one contains the notorious DeviceN merge, but isn't the most stable release ever. If you're interested in DeviceN, you'll want it.

I'll be at Seybold SF over the next three days. I don't feel entirely prepared for it, but it's always a lot of fun to meet customers and users, and to see what else is going on in the industry. It'll be a bit strange being there on the 11th. This anniversary touches a pretty deep chord for probably all Americans. I was a bit surprised to have a dream about it a few days ago. May everyone have a peaceful Wednesday.

Testing and proofs

In a recent phone conversation, Peter Deutsch quoted Dijkstra's famous saying, "testing can only show the presence of bugs, never their absence." Dijkstra was arguing for a more systematic approach to programming, in particular proofs of correctness. I agree, but here we are in 2002 with extremely poor tools for writing and deploying provably-correct programs. Thus, my response is: "without testing, you don't even have a way to detect the presence of bugs."

I'm now more convinced than ever that thorough testing is the most practical route to high quality software. We're doing a lot of regression testing on Ghostscript now, and increasing it still further, but these tests are still not what I'd call thorough.

I think that one of the best things from the Extreme Programming movement is the idea that testing is baked in to the development process. I plan to follow this deeply for Fitz development. Among other things, I want to test for memory leaks (relatively easy if you just count mallocs and frees), UMR's and other runtime flaws (which means running tests under Valgrind), memory usage, speed, and of course consistent rendering. The latter is, I think, especially important when doing hardware accelerated rendering, such as with XRender. Testing that the results are consistent with unaccelerated rendering is quite straightforward.

Ultimately, I would like to prove Fitz correct. In principle, this shouldn't be all that hard to do. John Harrison's thesis shows how to do serious work with reals in the HOL framework. It's not hard to imagine continuing this work into computational geometry concepts such as line segments, polygons, and Bezier curves.

Even in the context of proving programs correct with respect to a formal specification, I think intensive testing will usually be required, if for no other reason than to validate the specification. I find flate (zlib) compression a useful example. One aspect of the spec is trivial: decompression composed with compression should be the identity function over streams. Yet, it's also important to show that the compression is compatible with existing implementations. I'd have almost no confidence in this just from eyeballing the formal spec. The only real way to do it is to test it. And, as I say, I consider flate compression to be one of the problems most amenable to a formal approach. It's impossible to imaging doing something like a Word-to-PDF converter without intensive testing.

Conscious runtime design

Chris Ryland sent me an email pointing out that Python's runtime was consciously designed to meet certain goals, including runtime compatibility with C. Indeed, Python is one of my main inspirations.

I think it's possible to argue that Python got one thing wrong, which is its choice of memory freeing discipline. Python 1 is reference counted only, and Python 2 adds garbage collection. The result is more complex, and probably slower, than one discipline or the other alone. Of course, for something like Python, reference counting only has the serious disadvantage of leaking memory whenever programs create circular data structures.

Memory freeing discipline is one of the most difficult problems for runtime design. There is no one approach which is obviously best. In particular, reference counting is definitely imperfect. Aside from the problems with circular data structures, it's often not the fastest approach. Garbage collection is often faster, largely because you don't have the overhead of constantly bumping the reference count.

But reference counting has one killer advantage: it's easy to integrate into other runtimes. In particular, a refcounted runtime can be slave to a garbage collected runtime, knitted together through finalization calls. By contrast, knitting together two garbage collectors is quite hard. And, in doing a graphics rendering library rather than a full programming language, I can simply avoid creating circular structures by design. Thus, I've arrived at reference counting as the best choice.

I can think of one alternative, which is appealing but not practical given today's technology. Imagine a language that is designed explicitly to make runtime-compatible libraries. The compiler will then generate different code for the same source, depending on the target runtime. Thus, the generated code for Python 2 would have both reference counting and garbage collection tracing methods. The generated code for Ruby would have just garbage collection. Another target might assume the Boehm-Weiser collector, and yet another might just have reference counting. Maybe an Apache module target does allocations in pools. Obviously, writing source that can be translated into high quality code for all these runtimes would take some care, but I don't see it as impossible.

I think such an approach has lots of advantages, but for obvious reasons I don't want to put the design and implementation of such a language on the critical path for Fitz.

Psyco

Earlier, I wrote expressing skepticism that Psyco would achieve its goals. Happily, it looks like I'm being proved wrong.

In fact, if it does work (which it's beginning to look like), the Pysco approach has some definite advantages. Primarily, you don't need to complicate the language to add static annotations. Rather, if something is always the same type (or the same value, or whatever), it's detected dynamically.

The fact that people are seriously exploring different approaches to performance in very high level languages is marvelous.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!