# Older blog entries for raph (starting at number 279)

On the relation between proofs and code

Much of the time when I'm looking at proof systems, I feel like a novice. There's so much highly technical stuff, and it's not easy for me to figure out which of it is even worth learning.

Even so, occasionally I see clear glimpses. Sometimes, these glimpses don't even seem to be written down anywhere else. That's what I'll try to communicate tonight.

Formal logical systems

Here's a very simplified summary of formal logical systems. I strongly encourage you to get familiar with a real system, but if not, this will have to do.

Each such system has a language in which you can make statements. In a proof checker, you want to accept a lot of these statements (corresponding to true mathematical statements), but reject all those that correspond to false mathematical statements. Of course, there will always be plenty of statements which you can neither prove nor disprove, but we won't worry about those here.

In any case, most formal systems work like this: there is a small number of axioms considered true without needing proof, and then a small number of inference rules, which take one or more existing statements considered true, and spit out a new one. A particularly important inference rule is the "modus ponens". If you've proved A and proved that A implies B, then the modus ponens gives you a proof of B. In many systems, the modus ponens is what gives proofs their branching tree structure, and is also the best way to take "large leaps", since the inference rules themselves tend to be quite primitive.

There's a lot of detail we'll skip over here. I'll mention the fact that many axioms are actually axiom schema, and the fact that you have to watch out for free vs. bound variables (or, in Metamath, distinct variable conditions).

The de Bruijn criterion

At least, the outline above is true for systems satisfying the "de Bruijn criterion", which Freek Wiedijk defines as having the correctness of the system as a whole depends on the correctness of a very small "kernel". There are also a bunch of other systems that do more stuff in the proof steps, such as computations and so on.

Failure to satisfy the de Bruijn criterion has a number of disadvantages. Many people talk about the philosophical issue of knowing whether your results are really correct. Instead, I'll stress the issue of translating proofs from one system to another. If your steps are kind of big and ad hoc, it's much harder to know how to do this.

You can see why people want to relax a bit from purely formal rigor, but in fact systems satisfying the de Bruijn principle tend to admit proofs just as concise as those that don't. See Freek's "Comparing Mathematical Provers" for evidence of this claim.

So I'm going to assume that the de Bruijn principle is a good idea. In a larger ecology of proof systems, those that don't satisfy it will need translation into one that does, but not the converse.

The Poincare principle

On the other hand, doing things like numeric calculations by hand is quite tedious. Freek considers systems that can prove the correctness of calculations automatically as satisfying the "Poincare principle".

This principle raises the question of exactly what kind of software you've got for doing these proofs. You can do an extremely simple program that just checks proofs. Such a program won't satisfy the Poincare principle. So-called "interactive theorem provers" such as HOL have lots of subroutines that crank through a potentially large numbers of primitive inference rules. These systems are inevitably more complex, but do satisfy the Poincare principle.

Even so, automatic calculation, while very good for saving time when trying to make proofs, is not good for clarity and transparency. In particular, it's very difficult to know when these subroutines ("tactics" in HOL-speak) will succeed and when they'll fail. In particular, a lot of the things you want to automate are undecidable in general.

Automation also increases the risk of "version skew". Assuming that the technology is steadily improving, then you can't use an older system to check a proof script made with a newer system, even if it's perfectly capable of following the primitive inference steps generated by the latter.

This version skew is just one aspect of proof scripts being bound too tightly to short-lived software, rather than based in eternal mathematical truth. In particular, the way scripts are done in HOL seem especially hackish to me, with a combination of commands which modify the global state, output of half-chewed proofs as ML programs in ASCII format, and interactions with a make system that, among other things, leaves a lot of x86 machine code lying around. It seems to work reasonably well in practice, but it can't possibly be the Right Way.

The way out

The way out seems obvious to me: separate the proof generation and proof checking tools. The checker can be very small and stable, while proof generator can ride the feature treadmill just like most other software.

Further, you can have a diversity of tools for proof generation. Some might look just like HOL. Others might have a GUI, or perhaps Web, front-end. You might well do small, specialized tools for generating proofs in particular domains.

The interface between the tools is a proof file format, strongly analogous to object code. Most people think of object code as an unpleasant but necessary step in the process to get programs executed efficiently, but I would rather like to sing its praises. In particular, object code is key in binding together components written in different programming langauges. I can easily see the same thing happening in proofs.

In my opinion, one of the strongest advantages is that you'd be able to explore new "source" languages for writing proofs, and still be able to bind with existing results. There's a lot of stuff that's been proved in mechanically-checkable form. Having to redo it all every time you want to play with a new proof language would be a crippling limitation.

Notes on computational complexity

The relationship between code and proof can now be seen as one of straightforward engineering tradeoffs, rather than a deeply philosophical inquiry into the nature of proof.

A big engineering question is: how long do these calculations take when done in a proof system. Harrison's thesis discusses this question fairly deeply. He put some engineering work into efficient representations (using bit sequences rather than iterated succession for the naturals, for example), so that computations would be reasonably efficient. Even so, they're enough slower so as not to be competitive with, say, a good bignum library.

In HOL, the cost of a primitive inference step is a certain amount of CPU time. In the system I'm talking about, it's also a certain amount of space on disk for the "proof object" file, and, probably more interestingly, bandwidth when you're moving these things around the Web.

Given that a good bignum library is a few orders of magnitude faster than doing calculation in an HOL system, there's an overwhelming temptation to just call the bignum library instead when you need calculations. That's how the de Bruijn criterion gets violated :)

But we can look at the question: just how much are we losing by forcing everything through primitive inference rules. Let's say you have a program which does calculations, and also a proof that the program gives the correct answers. How long does it take to "execute" each step in the program? In the case of HOL, there's an upper-bound answer of about eight orders of magnitude (based on the ARM model included in HOL). Of course, in practice you'll usually be able to simplify the proof a lot.

Similarly, the question of file size is deeply related to compressibility. Kolmogorov complexity is the obvious relevant theory here.

So shall we run arbitrary code to check proofs?

Given the obvious usefulness of code, it's tempting to put a virtual machine into the proof checker and proof object file format. A proof can include code when it also includes a proof that the code is correct. Such a proof will also imply that the code is safe, by the way. The checker will verify the proof of the code first, then assuming it passes, just run it. The output can be entered into the database of statements proved correct, just as if it had been spit out from an inference rule.

The consequence, I think, is a potentially significant performance boost, but not an increase in the logical power of the system. In all cases where you'd execute code, you could just interpret the virtual machine using inference rules instead.

It may be that this performance increase becomes worthwhile. Right now, it doesn't seem to be. In fact, proof checkers on modern machines seem to be almost ridiculously fast. Freek's Automath implementation checks the entire Grundlagen der Analysis in under a second (roughly double the time taken to interpret a single ARM instruction in HOL). Similarly, Metamath takes about a second to verify all of set.mm, including Hilbert spaces and other nontrivial results.

So I think code is not worth it until we start building a library of slow-running proofs, and that doing an efficient virtual machine once becomes a better investment than optimizing each of the proofs.

Whew, that was fun. We just had an ingenious virus implemented and deployed against this site. The hole is closed now, and apparently the only damage done was corrupting people's names.

The hole itself was very simple: strings entered in the "given name" and "surname" fields was passed through to the final rendered HTML with no quoting. The virus exploited this hole to embed "iframe" tags, which then caused a form submission changing the name fields and propagating the virus.

My guess is that a huge fraction of sites that take postings (web boards, guestbooks, etc.) have similar problems. I tried my best to code mod_virgule carefully, the code has been quite stable, and a number of people have been over it. There are just so many security gotchas that it's nearly inevitable you'll run across one or another.

Here's one that's gotten relatively little publicity: Netscape 4.X on Unix treats 0x8B and 0x9B the same as < and >. Thus, any site which posts user-submitted text and does not filter these characters is vulnerable to exactly the same kind of attack. Try it and see.

How would you make it secure?

There are two problems here. First, string quoting wasn't done properly. There is a systematic approach to the problem, but I haven't seen it used much (including the mod_virgule code): types. Specifically, you have one type for arbitrary strings, and other types for variously quoted strings (the web has a few such contexts, some of which are quite similar). The only way to cast between them is to do a quoting operation (or to verify that no quoting is needed).

But the deeper problem is that Web applications interact with Web browsers, which are capable of extremely rich and diverse behaviors (not entirely unlike execution of macros in documents, when it comes down to it). Reasoning about exactly what a browser might do in all possible contexts is difficult to the point of intractibility.

Do I think that Advogato is "secure" now? The odds are against it, even though I'd argue that the code quality and process are superior to most web applications out there. Yet, a lot of people seem to think that a Web browser is a good interface to do things like move around money.

(thanks to Bram for discussion on this topic)

Proofs

I got a nice note from Freek Wiedijk about proof systems (somebody had tipped him off that I was writing about them in my diary). Among other things, he pointed to mowgli, a math system explicitly designed to work on the Web. He and I agree that XML is not all that tasteful, although I do think that the Web emphasis is a good thing.

Freek is also a fan of Mizar. I agree, it's worth looking at. One of the things that turned me off initially is the lack of a free software implementation (the official implementation is almost free). Given the impressive library of proofs, and the other advantages of the system, including readability, I'd imagine that doing a new implementation would be a nice smallish project.

Taral brought up "presence" as an interesting problem in peer-to-peer networking, which doesn't seem to have gotten much attention. It's a concept from instant messaging, but I define it to mean notification of "join" and "leave" events.

Presence is a special case of the general problem of change notification. A robust, efficient technique for change notification would be incredibly useful. On the Web, most people do it by pinging, but with obviously poor latency and waste of bandwidth (and an unpleasant tradeoff between the two). For the most part, it works almost well enough, but people feel the limitations enough to propose crude new mechanisms such as RSS clouds.

One special challenge with presence is that the target node can't itself send a notification when it goes down. You have to either ping it yourself (with problems noted above) or rely on a witness. Then, you get into the dual problems of whether the witness is trustworthy, and handling cases where it suddenly leaves at the same time as the target ("backhoe failure").

My personal feeling is that in a robust change notification system, you have to deal with the reliability of proxies anyway, so presence is best implemented as one application of a general change notification mechanism. Taral feels, rather, that presence is a separate building block that's needed to implement other change notification services.

In any case, I think it's an excellent research project for somebody interested in peer-to-peer networking, in part because it's probably a lot easier than many p2p problems. After all, pinging does work, you just want better performance.

Obvious applications of change notification include Web caching, DNS propagation, and "ticker"-style news clients. I'm sure that you'd see a lot more applications if there were a good infrastructure available.

Max the Unsqueamish

Our compost bin went bad over the past couple of days, probably helped by the warmer than average weather. Today there were flies buzzing around, so we decided to dump the whole thing in the garbage. Of course, Max wanted to help.

When we opened it up, we were confronted with probably the grossest spectacle in our lives. The bin was crawling with maggots, and with a smell to match. Shit would be pleasantly aromatic by comparison. I offered to dump it out by myself, but Heather assured me she could do it herself. I took Max inside because I figured he did not need to see (and smell) this.

In any case, Heather asked me back outside for help shortly. Max absolutely insisted on coming outside too. He knew there was something happening and did not want to miss it. So we let him, and a few minutes later we had the worst of it dumped in the garbage bin.

Of course, the compost bins still needed hosing down. This is Max's domain, and he wasn't about to let anyone else encroach on it. So he happily sprayed them with the hose until they were reasonably clean. I went inside for a bit, at which point Heather asked whether I threw up. I didn't, but I can see why she asked. I had gagged a bit when we were dumping out the lower tray, entirely filled with maggots swimming in putrid water.

But the next part truly earns Max his title above. When we were done, we went back inside and Max asked for me to cook dinner. He said he was hungry, that he loves "dinner", and that it makes him grow big and big, so he can ride the bumper cars. This has been a goal of his ever since he went to Marine World and was denied riding in the bumper cars because he was too small, even though Alan got to ride. I got him a Pizza Lunchables from the fridge, which he ate heartily. Then Heather and I waited a bit more than an hour for our stomachs to settle down enough to eat.

There are times I doubt he's normal.

20 Sep 2002 (updated 20 Sep 2002 at 08:00 UTC) »
Proof freak

I came across some interesting stuff by Freek Wiedijk, especially his comparison of 15 notable theorem proving systems.

See his abandoned manifesto for more insight into why I'm so interested in this topic. I consider his analogy to the Web to be exact. Of course, he wants to do mathematics, but I want to do computer science that doesn't suck. It's a very good sign that our visions are so similar then.

Diary filtering

Sorry for not following up on the patch to add an ignore list. What I really want is filtering based on the ratings. That post outlines the grand scheme, but if we want to start by doing simple thresholds, I'll take the patch.

Kids

More evidence that Max is turning into a computer geek: the other night, he was talking in his sleep, and said "Apple-Q it, daddy". We're starting to seriously use the Web for homeschooling now. A couple of days ago, we found some clips of animal noises. Both kids thought the chimpanzee was funny.

Null desktop

Red Hat's new desktop attempts to create a coherent interface, abstracting the differences between KDE and Gnome in particular. This move seems to have gotten some criticism from the Slashdot crowd, but I think it's great.

Red Hat is trying to solve a real problem. There are a lot of geeks who are into fiddling with every detail of their systems, but for the rest of us, dealing with two different desktops is one big usability problem.

Ok, so it's a problem, but is Red Hat going about it in the right way? Absolutely. They're rolling up their sleeves, doing the work, and releasing code. The Gnome and KDE camps haven't made all that much progress by themselves, and if they tried to build some consortium or something, it would more likely than not take forever and go nowhere. The Red Hatters, I think, understand the true free software Way.

Owen's apology is a model for honesty and forthrightness. When's the last time you saw something like that from a proprietary software company?

Maybe Red Hat isn't doing this the best way. If so, feel free to do a better job. There are already way too many distributions. Maybe it's best to give people the option between Gnome and KDE, and only have things work smoothly if users choose applications written for that desktop. Maybe it's better for distros to just pick one, and abandon all the useful apps from the other camp. I like the Null approach, though, because it preserves real choice in developing apps.

Even more, I think that the popularity of the unified desktop will have a strong positive influence on both the Gnome and KDE camps, to put more thought and effort into making things run more smoothly in an integrated enviroment. A lot of people are going to be using Null.

Some people seem to be upset that Red Hat is creating their own brand for this desktop. As long as credit is given where due, I have no problem with it. In fact, if it helps them make money, so much the better.

I'll make sure there's a good Fitz widget for both KDE and Gnome. I've felt this way a long time. I'm happy that, for example, Abiword and ksvg are both using libart for their SVG stuff.

There's a lot of controversy around different versions of RSS. To make things even worse, aaronsw posted a so-called RSS 3.0. It seems to be a mix of serious simplicity engineering and spec parody. If I were trying to emphasize the former, I wouldn't have called it RSS 3. Maybe RSS -1?

In any case, RSS is a perfect example of a poorly implemented spec. Real-world RSS parsers must be "liberal", because it's rare for feeds to strictly conform to the spec, or, for that matter, strictly conform to the XML spec. I see non-ASCII and XML metacharacters improperly quoted as often as not. The idea of writing a rigorously formal spec is a joke. It might be an interesting exercise, but in the real world you're forced to do workarounds and hacks.

Logic over the Web

I'm beginning to grok HOL. There's a pretty impressive library of stuff proved in the system, including Harrison's reals construction, Rijndael encryption, and the ARM architecture. It's obviously a cool system, but again I think it could be refactored a bit for the Web.

The Web "workflow" I'm envisioning doesn't correspond exactly to any formal proof system I've seen. I'm thinking of an app which uses proof files downloaded from the Web, interactively proves new things using those proofs, then creates new proof files which can be uploaded to the Web. The key question: what should be the format of the proof file?

The key goals for such a system are to reduce the barriers to using proofs developed by others (perhaps using different systems such as Metamath or HOL), and publishing your own proofs so they can be used by others (again, perhaps using different systems). Further, actually making proofs shouldn't be tedious. In particular, you should definitely not have to do easily decidable stuff like numeric calculation and parsing by hand.

With existing proof technology, you have to make some pretty harsh tradeoffs. In particular, you have to choose a primitive logic, such as ZF set theory or typed lambda calculus. Then, interchange with similar systems is (hopefully) easy, but painful to impossible for systems with different primitives.

This shouldn't be. If you're proving results in, say, number theory, why should you care whether they're constructed from ZF sets or lambda terms? The numbers are the numbers. The same thing goes for most objects of interest in computing, which tend to be finite or very nearly so. These things should be portable among proof systems.

What's the way out? I see a few possible paths, if vaguely. The simplest is to optimize for one direction of interchange. For example, one reasonable goal might be to develop a portable library of basic mathematical objects. You'd do proofs over this library using simple inference rules (axioms), then have multiple exports for the various proof systems. Each such export would probably use handcrafted constructions, which might be quite different. For reals, why not use Harrison's construction in HOL, and standard Dedekind cuts in Metamath?

All this interchange stuff is deeply related to proof longevity. Presumably, the mathematical truth embodied in the proof is timeless, but any actual encoding into a file format is likely to be as short-lived as the program which processes that file format. If proofs can be fluidly translated into new containers, perhaps they might have more reasonable lifetimes.

Photos

The CCSD Web site is up, and my family's smiling faces adorn the top. Second from left is Alan, Heather, and Max (original), and rightmost is Heather's father Richard with Alan (original).

The story of the first picture is kind of funny. It was a very sunny Christmas day, and we were having a family walk down Benicia's first street, with me taking along the camera and trying to get pictures of their new toys. They sat down on one of the benches, and I tried to take a picture. In order to provoke a smile, I started acting completely silly, dancing around, making noises, and in general making a complete fool of myself, in between snaps. Judging from the photo, they thought it was funny. Anyway, after I was done I noticed that there was an elderly lady who had watched the whole thing. She didn't want to walk in front of the camera and interrupt me. I felt more than a bit silly, but I consider the result worth it.

Kids

Poor Alan got stung by a wasp, twice. It was obviously painful right after he got stung, but I think the freaking out was even worse. In any case, he's doing well. He seems to be settling into a routine of half-day homeschooling, and half-day in the public school. I've been doing Singapore Math, Crash and Burn Chemistry from Wild Goose, and

Max's language development continues to amaze us. An interchange from today: "I don't like dragons. Dragons scare me. I was scared by dragon, when I sleeped." Me: "Did you have a dream about dragons? A bad dream?" Max: "Yeah. It was a ghost. I kicked it." Another, between Heather and Max: Heather: "Should I put on your shirt?" Max: "Yes, put it on me." A week or so ago, he said, "Please be quiet, Alan", with perfect intonation, just like a librarian, and cracked us up.

Colorimeters

There were relatively few interesting new toys at Seybold. One of the more interesting things was the Pantone Color Cue. At \$350 retail, this is one of the most inexpensive colorimeters yet. It's hard to tell how good it is compared with a pro model such as the X-Rite DTP41 or Gretag Eye One, but there's no reason why a high quality device can't be made for that price. In fact, if there were a volume market for spectrophotometers, there's no reason why they couldn't profitably be sold for well under \$100. The Pantone device strangely doesn't include a USB port, or any connection to a computer, for that matter. However, it's not a big stretch to imagine a second generation device which does. At that point, some good software for making color profiles could have a great impact. Argyll already has most of the core functionality. The main thing needed is UI polish.

Bugs in programs proven correct

I believe that programs can be designed so that they are provably correct with respect to their specification. However, I do not believe that writing good specifications is easy. Let's look at ways bugs can creep in.

At the very lowest level, it's always possible that the definitions used in the formal proof don't match the meanings intended. For example, if you run Metamath's set.mm through tr 45 54, it will happily verify the theorem ( 2 + 2 ) = 5. I'd consider this a bug, even though when you expand out all the definitions, the resulting hodgepodge of predicate calculus and ZFC set theory is a true statement. While confusing 4 and 5 is probably unlikely in practice, it's pretty easy to see how more subtle bugs in mapping may occur.

Of course, when trying to prove things correct about existing things, the big problem is simply unmanageable complexity. Only toy langauges tend to have formally defined semantics at all. The specifications for all "real" programming languages are simply too ambiguous to prove anything at all, especially when you consider the runtime as well. I believe that it is possible to design programming languages and runtimes to be much, much simpler. If you want formal proofs, you'll either have to chuck existing things in favor of the new, simpler variety, or you'll have to spend a huge amount of effort to deal with the warts and overcomplexity of existing things. Both approaches have obvious disadvantages.

Compatibility with existing designs is also important. Internal consistency can be proved, but compatibility can only be tested. This principle extends to file formats, programming languages, instruction sets, compression algorithms, communication protocols, file systems, all manner of I/O techniques, API's, etc. In other words, most of the things you care about when building a real system.

Another technical issue is the difficulty of writing specifications that encompass all of the desired functionality. For example, if you want to prove that an implementation of a public key cryptosystem is protected against timing attacks, then your model of execution must take timing into account. Such a thing is fairly rare.

Some things lend themselves fairly well to formal specification, others less so. Compression algorithms (particularly lossless ones) are good examples of the former, while GUI's are good examples of the latter. If you haven't written a good spec, then you may end up with a flawless implementation of buttons that disappear when you press them, menus hidden behind windows, and so on. The only way out is careful design and thorough testing, which is of course the same as now.

Overall, I believe that provably correct programming is worth the effort. It will take a very long time, though.

Debuggability of reference counting

Peter Deutsch sent email criticizing reference counting on the basis that all clients have to correctly adjust the references. Often, a garbage collected runtime can be more forgiving.

I see the point, but still think that reference counting is easier. For one, improved technology makes it easier to debug problems in reference counted systems. In particular, I'm thinking about Valgrind. The most common reference counting problems (failing to decrement, decrementing before last use, decrementing twice) result in completely clear diagnostics from Valgrind, complete with stack traces. Even better, it's deterministic. By contrast, GC bugs depend critically on when garbage collections occur.

Another point is that the client is not always written in a very low level language, and thus may not have to do the runtime memory discipline by hand. That's one reason I'm interested in the Python bindings.

Thus, I think that reference counting is a perfectly reasonable approach when debugging is a goal.

Lots of interesting threads to respond to. I was going to write more tonight, but instead I spent over three hours talking with David McCusker on the phone. It was well worth it, as he was able to give me some new ideas about trees in Fitz, and I explained to him my trick for representing arbitrary trees in a btree-like structure.

Garbage collection

mwh wonders about terminology. I guess there are people who consider reference counting to be a form of garbage collection, but I tend to think of them as two different approaches to the same problem. To me, a garbage collector is something that can collect garbage even when it has cyclic references. I guess it all depends on who is master :)

By this criterion, Python 2's "cycle detector" is a real garbage collector, but there is also reference counting. When a reference count goes to zero, a structure can be freed immediately, which is not true of "pure" garbage collected runtimes. In fact, if you're careful not to create circular references, you might never need to invoke the cycle detector.

Essentially, the hack in Python is to use reference counts to determine the roots. In any garbage collected runtime, finding the roots is a challenging problem. The problem is even harder when code is in C, because a lot of the time, you'll want references stored on the stack to be considered roots. Python counts the number of references in to each object, and if this doesn't match the reference count, considers the object to be a root.

I haven't done benchmarking, but my intuition tells me that Python's approach is one of the poorest performing. For one, you have 4 bytes of overhead per object to store the reference count. For two, as mwh points out, just forgetting a reference pulls the object into the cache. For three, bumping reference counts causes memory traffic. In many cases, you'll get write traffic to main memory even when you're just doing a read-only traversal. For four, I'm sure that Python's cycle detector is slower than most mark-sweep collectors, because of all the extra accounting it has to do.

mwh guesses that reference counting is better performing than stop and copy. Actually, stop and copy tends to be one of the best performing approaches. Allocation is very cheap, in the common case just advancing a free pointer. On collection, only active objects have any cost. In most systems, the majority of objects have very short lifetimes. For these, the cost of freeing is essentially zero. Finally, stop and copy often compacts objects, increasing spatial locality, so you may need to pull fewer cache lines from main memory to access an structure. But note that Hans Boehm doesn't agree with me. See his slides on GC myths, for example, and this comparison of asymptotic complexity.

The GC FAQ looks like a pretty good reference.

mbp: Java's example is not particularly helpful for what I'm trying to do. My goal is to have a library that works well when bound to whatever runtime the client wants. The Java philosophy, by contrast, is to nail down a runtime and force all components of a system to conform to it. In many cases, that's actually a good thing. It minimizes the impedance mismatches between modules. Further, Java's runtime is actually a pretty good one. A huge amount of effort has obviously gone into tuning its performance, and I'm impressed that they've gotten garbage collection in highly multithreaded environments to work so well. Java's runtime is undoubtedly better than most hand-rolled stuff.

I'm not convinced by your claim that Java is agile with respect to garbage collection discipline. I consider the ability to count, say, only forward references in a doubly linked list to be a minimum requirement for any decent reference counted discipline. Where is that in Java? I know that embedded Java folks are having success with coding patterns that allocate very little memory, but at the cost of a more constrained use of the language. In particular, if you use a library written for general Java in such an environment, it won't work well. Conversely, general Java users probably won't much like libraries designed for embedded stuff, because they're used to a dynamic style.

Other stuff

I have things to say about proofs vs testing, debuggability of runtimes, and media coverage of Hatfill, but they'll all have to wait for another day.

Ghostscript

A 7.30 release is pre-announced. This one contains the notorious DeviceN merge, but isn't the most stable release ever. If you're interested in DeviceN, you'll want it.

I'll be at Seybold SF over the next three days. I don't feel entirely prepared for it, but it's always a lot of fun to meet customers and users, and to see what else is going on in the industry. It'll be a bit strange being there on the 11th. This anniversary touches a pretty deep chord for probably all Americans. I was a bit surprised to have a dream about it a few days ago. May everyone have a peaceful Wednesday.

Testing and proofs

In a recent phone conversation, Peter Deutsch quoted Dijkstra's famous saying, "testing can only show the presence of bugs, never their absence." Dijkstra was arguing for a more systematic approach to programming, in particular proofs of correctness. I agree, but here we are in 2002 with extremely poor tools for writing and deploying provably-correct programs. Thus, my response is: "without testing, you don't even have a way to detect the presence of bugs."

I'm now more convinced than ever that thorough testing is the most practical route to high quality software. We're doing a lot of regression testing on Ghostscript now, and increasing it still further, but these tests are still not what I'd call thorough.

I think that one of the best things from the Extreme Programming movement is the idea that testing is baked in to the development process. I plan to follow this deeply for Fitz development. Among other things, I want to test for memory leaks (relatively easy if you just count mallocs and frees), UMR's and other runtime flaws (which means running tests under Valgrind), memory usage, speed, and of course consistent rendering. The latter is, I think, especially important when doing hardware accelerated rendering, such as with XRender. Testing that the results are consistent with unaccelerated rendering is quite straightforward.

Ultimately, I would like to prove Fitz correct. In principle, this shouldn't be all that hard to do. John Harrison's thesis shows how to do serious work with reals in the HOL framework. It's not hard to imagine continuing this work into computational geometry concepts such as line segments, polygons, and Bezier curves.

Even in the context of proving programs correct with respect to a formal specification, I think intensive testing will usually be required, if for no other reason than to validate the specification. I find flate (zlib) compression a useful example. One aspect of the spec is trivial: decompression composed with compression should be the identity function over streams. Yet, it's also important to show that the compression is compatible with existing implementations. I'd have almost no confidence in this just from eyeballing the formal spec. The only real way to do it is to test it. And, as I say, I consider flate compression to be one of the problems most amenable to a formal approach. It's impossible to imaging doing something like a Word-to-PDF converter without intensive testing.

Conscious runtime design

Chris Ryland sent me an email pointing out that Python's runtime was consciously designed to meet certain goals, including runtime compatibility with C. Indeed, Python is one of my main inspirations.

I think it's possible to argue that Python got one thing wrong, which is its choice of memory freeing discipline. Python 1 is reference counted only, and Python 2 adds garbage collection. The result is more complex, and probably slower, than one discipline or the other alone. Of course, for something like Python, reference counting only has the serious disadvantage of leaking memory whenever programs create circular data structures.

Memory freeing discipline is one of the most difficult problems for runtime design. There is no one approach which is obviously best. In particular, reference counting is definitely imperfect. Aside from the problems with circular data structures, it's often not the fastest approach. Garbage collection is often faster, largely because you don't have the overhead of constantly bumping the reference count.

But reference counting has one killer advantage: it's easy to integrate into other runtimes. In particular, a refcounted runtime can be slave to a garbage collected runtime, knitted together through finalization calls. By contrast, knitting together two garbage collectors is quite hard. And, in doing a graphics rendering library rather than a full programming language, I can simply avoid creating circular structures by design. Thus, I've arrived at reference counting as the best choice.

I can think of one alternative, which is appealing but not practical given today's technology. Imagine a language that is designed explicitly to make runtime-compatible libraries. The compiler will then generate different code for the same source, depending on the target runtime. Thus, the generated code for Python 2 would have both reference counting and garbage collection tracing methods. The generated code for Ruby would have just garbage collection. Another target might assume the Boehm-Weiser collector, and yet another might just have reference counting. Maybe an Apache module target does allocations in pools. Obviously, writing source that can be translated into high quality code for all these runtimes would take some care, but I don't see it as impossible.

I think such an approach has lots of advantages, but for obvious reasons I don't want to put the design and implementation of such a language on the critical path for Fitz.

Psyco

Earlier, I wrote expressing skepticism that Psyco would achieve its goals. Happily, it looks like I'm being proved wrong.

In fact, if it does work (which it's beginning to look like), the Pysco approach has some definite advantages. Primarily, you don't need to complicate the language to add static annotations. Rather, if something is always the same type (or the same value, or whatever), it's detected dynamically.

The fact that people are seriously exploring different approaches to performance in very high level languages is marvelous.

Metamath and HOL

I got a nice email from Michael Norrish, suggesting that maybe HOL is not as difficult as I suspected. The core of HOL Light is only about 1000 lines of ML. That's an appealing level of simplicity. Of course, ML is particularly well suited for doing HOL-style logic, but that hardly counts as a strike against it!

Without meaning any disrespect for other projects such as Mizar, QED, Mathweb, NuPrl, etc., for the stuff I'm interested in, HOL and Metamath are definitely the two systems of interest. They're both simple, they're both founded on mature work in mathematical logic, and there's enough mathematics done in both to make comparisons useful.

Unfortunately, the two systems are different enough to pose major compatibility problems. At heart, functions are different beasts. In ZF(C) set theory, they're essentially untyped, and allowed to be partial. They can be hugely infinite, but can't take themselves as arguments. Thus, even such a simple and familiar expression such as lambda x.x can't be directly expressed as a ZF set.

HOL functions, on the other hand, are typed, but total within that type. Thus, the function lambda x. 1/x is problematic. Harrison's construction of the reals in HOL arbitrarily takes 1/0 = 0. The trick of polymorphic types lets you apply a function to itself, with just enough limitations to avoid the paradoxes. So lambda x.x is back in the club.

I get the feeling that there's been work on connecting the two flavors of formal system, but it seems technical and difficult to me. This is disappointing.

In any case, I feel this interest in logic burning itself out for now. I'm sure I'll come back to it later; it's been an interest of mine for a long time. I still feel that a Web-centered distributed proof repository could be a very interesting thing, and that the tools I'm comfortable with could be applied well. But what I do not need right now is another big project.

Ghostscript

The big DeviceN merge happened, and it's not too bad. There are still some regressions, but they're settling down. A big chunk of my recent work has been putting together a framework to test whether 6-color DeviceN rendering is consistent with plain CMYK. I'm becoming an even firmer believer in testing, and I think that graphics rendering systems are particularly suitable for automated testing.

Fitz

I'm craving a chunk of time to sit down and write Fitz design ideas, but with the Ghostscript 8.0 release process taking all my time, it's not been easy. Even so, some aspects of the design seem to be coming into relatively clear focus, even as details on things like font and text rendering seem elusive. I'm particularly excited about the design of the runtime discipline. In particular, it should be a lot easier to understand and integrate with other systems than the current Ghostscript runtime discipline. I don't see a lot of work on conscious design for runtimes; more commonly, designs inherit runtimes from somewhere else (language, library, predecessor). As such, I'm hoping that the runtime document will make interesting reading even for people not in the market for a graphics rendering library.

270 older entries...