Older blog entries for raph (starting at number 273)

Bugs in programs proven correct

I believe that programs can be designed so that they are provably correct with respect to their specification. However, I do not believe that writing good specifications is easy. Let's look at ways bugs can creep in.

At the very lowest level, it's always possible that the definitions used in the formal proof don't match the meanings intended. For example, if you run Metamath's set.mm through tr 45 54, it will happily verify the theorem ( 2 + 2 ) = 5. I'd consider this a bug, even though when you expand out all the definitions, the resulting hodgepodge of predicate calculus and ZFC set theory is a true statement. While confusing 4 and 5 is probably unlikely in practice, it's pretty easy to see how more subtle bugs in mapping may occur.

Of course, when trying to prove things correct about existing things, the big problem is simply unmanageable complexity. Only toy langauges tend to have formally defined semantics at all. The specifications for all "real" programming languages are simply too ambiguous to prove anything at all, especially when you consider the runtime as well. I believe that it is possible to design programming languages and runtimes to be much, much simpler. If you want formal proofs, you'll either have to chuck existing things in favor of the new, simpler variety, or you'll have to spend a huge amount of effort to deal with the warts and overcomplexity of existing things. Both approaches have obvious disadvantages.

Compatibility with existing designs is also important. Internal consistency can be proved, but compatibility can only be tested. This principle extends to file formats, programming languages, instruction sets, compression algorithms, communication protocols, file systems, all manner of I/O techniques, API's, etc. In other words, most of the things you care about when building a real system.

Another technical issue is the difficulty of writing specifications that encompass all of the desired functionality. For example, if you want to prove that an implementation of a public key cryptosystem is protected against timing attacks, then your model of execution must take timing into account. Such a thing is fairly rare.

Some things lend themselves fairly well to formal specification, others less so. Compression algorithms (particularly lossless ones) are good examples of the former, while GUI's are good examples of the latter. If you haven't written a good spec, then you may end up with a flawless implementation of buttons that disappear when you press them, menus hidden behind windows, and so on. The only way out is careful design and thorough testing, which is of course the same as now.

Overall, I believe that provably correct programming is worth the effort. It will take a very long time, though.

Debuggability of reference counting

Peter Deutsch sent email criticizing reference counting on the basis that all clients have to correctly adjust the references. Often, a garbage collected runtime can be more forgiving.

I see the point, but still think that reference counting is easier. For one, improved technology makes it easier to debug problems in reference counted systems. In particular, I'm thinking about Valgrind. The most common reference counting problems (failing to decrement, decrementing before last use, decrementing twice) result in completely clear diagnostics from Valgrind, complete with stack traces. Even better, it's deterministic. By contrast, GC bugs depend critically on when garbage collections occur.

Another point is that the client is not always written in a very low level language, and thus may not have to do the runtime memory discipline by hand. That's one reason I'm interested in the Python bindings.

Thus, I think that reference counting is a perfectly reasonable approach when debugging is a goal.

Lots of interesting threads to respond to. I was going to write more tonight, but instead I spent over three hours talking with David McCusker on the phone. It was well worth it, as he was able to give me some new ideas about trees in Fitz, and I explained to him my trick for representing arbitrary trees in a btree-like structure.

Garbage collection

mwh wonders about terminology. I guess there are people who consider reference counting to be a form of garbage collection, but I tend to think of them as two different approaches to the same problem. To me, a garbage collector is something that can collect garbage even when it has cyclic references. I guess it all depends on who is master :)

By this criterion, Python 2's "cycle detector" is a real garbage collector, but there is also reference counting. When a reference count goes to zero, a structure can be freed immediately, which is not true of "pure" garbage collected runtimes. In fact, if you're careful not to create circular references, you might never need to invoke the cycle detector.

Essentially, the hack in Python is to use reference counts to determine the roots. In any garbage collected runtime, finding the roots is a challenging problem. The problem is even harder when code is in C, because a lot of the time, you'll want references stored on the stack to be considered roots. Python counts the number of references in to each object, and if this doesn't match the reference count, considers the object to be a root.

I haven't done benchmarking, but my intuition tells me that Python's approach is one of the poorest performing. For one, you have 4 bytes of overhead per object to store the reference count. For two, as mwh points out, just forgetting a reference pulls the object into the cache. For three, bumping reference counts causes memory traffic. In many cases, you'll get write traffic to main memory even when you're just doing a read-only traversal. For four, I'm sure that Python's cycle detector is slower than most mark-sweep collectors, because of all the extra accounting it has to do.

mwh guesses that reference counting is better performing than stop and copy. Actually, stop and copy tends to be one of the best performing approaches. Allocation is very cheap, in the common case just advancing a free pointer. On collection, only active objects have any cost. In most systems, the majority of objects have very short lifetimes. For these, the cost of freeing is essentially zero. Finally, stop and copy often compacts objects, increasing spatial locality, so you may need to pull fewer cache lines from main memory to access an structure. But note that Hans Boehm doesn't agree with me. See his slides on GC myths, for example, and this comparison of asymptotic complexity.

The GC FAQ looks like a pretty good reference.

mbp: Java's example is not particularly helpful for what I'm trying to do. My goal is to have a library that works well when bound to whatever runtime the client wants. The Java philosophy, by contrast, is to nail down a runtime and force all components of a system to conform to it. In many cases, that's actually a good thing. It minimizes the impedance mismatches between modules. Further, Java's runtime is actually a pretty good one. A huge amount of effort has obviously gone into tuning its performance, and I'm impressed that they've gotten garbage collection in highly multithreaded environments to work so well. Java's runtime is undoubtedly better than most hand-rolled stuff.

I'm not convinced by your claim that Java is agile with respect to garbage collection discipline. I consider the ability to count, say, only forward references in a doubly linked list to be a minimum requirement for any decent reference counted discipline. Where is that in Java? I know that embedded Java folks are having success with coding patterns that allocate very little memory, but at the cost of a more constrained use of the language. In particular, if you use a library written for general Java in such an environment, it won't work well. Conversely, general Java users probably won't much like libraries designed for embedded stuff, because they're used to a dynamic style.

Other stuff

I have things to say about proofs vs testing, debuggability of runtimes, and media coverage of Hatfill, but they'll all have to wait for another day.

Ghostscript

A 7.30 release is pre-announced. This one contains the notorious DeviceN merge, but isn't the most stable release ever. If you're interested in DeviceN, you'll want it.

I'll be at Seybold SF over the next three days. I don't feel entirely prepared for it, but it's always a lot of fun to meet customers and users, and to see what else is going on in the industry. It'll be a bit strange being there on the 11th. This anniversary touches a pretty deep chord for probably all Americans. I was a bit surprised to have a dream about it a few days ago. May everyone have a peaceful Wednesday.

Testing and proofs

In a recent phone conversation, Peter Deutsch quoted Dijkstra's famous saying, "testing can only show the presence of bugs, never their absence." Dijkstra was arguing for a more systematic approach to programming, in particular proofs of correctness. I agree, but here we are in 2002 with extremely poor tools for writing and deploying provably-correct programs. Thus, my response is: "without testing, you don't even have a way to detect the presence of bugs."

I'm now more convinced than ever that thorough testing is the most practical route to high quality software. We're doing a lot of regression testing on Ghostscript now, and increasing it still further, but these tests are still not what I'd call thorough.

I think that one of the best things from the Extreme Programming movement is the idea that testing is baked in to the development process. I plan to follow this deeply for Fitz development. Among other things, I want to test for memory leaks (relatively easy if you just count mallocs and frees), UMR's and other runtime flaws (which means running tests under Valgrind), memory usage, speed, and of course consistent rendering. The latter is, I think, especially important when doing hardware accelerated rendering, such as with XRender. Testing that the results are consistent with unaccelerated rendering is quite straightforward.

Ultimately, I would like to prove Fitz correct. In principle, this shouldn't be all that hard to do. John Harrison's thesis shows how to do serious work with reals in the HOL framework. It's not hard to imagine continuing this work into computational geometry concepts such as line segments, polygons, and Bezier curves.

Even in the context of proving programs correct with respect to a formal specification, I think intensive testing will usually be required, if for no other reason than to validate the specification. I find flate (zlib) compression a useful example. One aspect of the spec is trivial: decompression composed with compression should be the identity function over streams. Yet, it's also important to show that the compression is compatible with existing implementations. I'd have almost no confidence in this just from eyeballing the formal spec. The only real way to do it is to test it. And, as I say, I consider flate compression to be one of the problems most amenable to a formal approach. It's impossible to imaging doing something like a Word-to-PDF converter without intensive testing.

Conscious runtime design

Chris Ryland sent me an email pointing out that Python's runtime was consciously designed to meet certain goals, including runtime compatibility with C. Indeed, Python is one of my main inspirations.

I think it's possible to argue that Python got one thing wrong, which is its choice of memory freeing discipline. Python 1 is reference counted only, and Python 2 adds garbage collection. The result is more complex, and probably slower, than one discipline or the other alone. Of course, for something like Python, reference counting only has the serious disadvantage of leaking memory whenever programs create circular data structures.

Memory freeing discipline is one of the most difficult problems for runtime design. There is no one approach which is obviously best. In particular, reference counting is definitely imperfect. Aside from the problems with circular data structures, it's often not the fastest approach. Garbage collection is often faster, largely because you don't have the overhead of constantly bumping the reference count.

But reference counting has one killer advantage: it's easy to integrate into other runtimes. In particular, a refcounted runtime can be slave to a garbage collected runtime, knitted together through finalization calls. By contrast, knitting together two garbage collectors is quite hard. And, in doing a graphics rendering library rather than a full programming language, I can simply avoid creating circular structures by design. Thus, I've arrived at reference counting as the best choice.

I can think of one alternative, which is appealing but not practical given today's technology. Imagine a language that is designed explicitly to make runtime-compatible libraries. The compiler will then generate different code for the same source, depending on the target runtime. Thus, the generated code for Python 2 would have both reference counting and garbage collection tracing methods. The generated code for Ruby would have just garbage collection. Another target might assume the Boehm-Weiser collector, and yet another might just have reference counting. Maybe an Apache module target does allocations in pools. Obviously, writing source that can be translated into high quality code for all these runtimes would take some care, but I don't see it as impossible.

I think such an approach has lots of advantages, but for obvious reasons I don't want to put the design and implementation of such a language on the critical path for Fitz.

Psyco

Earlier, I wrote expressing skepticism that Psyco would achieve its goals. Happily, it looks like I'm being proved wrong.

In fact, if it does work (which it's beginning to look like), the Pysco approach has some definite advantages. Primarily, you don't need to complicate the language to add static annotations. Rather, if something is always the same type (or the same value, or whatever), it's detected dynamically.

The fact that people are seriously exploring different approaches to performance in very high level languages is marvelous.

Metamath and HOL

I got a nice email from Michael Norrish, suggesting that maybe HOL is not as difficult as I suspected. The core of HOL Light is only about 1000 lines of ML. That's an appealing level of simplicity. Of course, ML is particularly well suited for doing HOL-style logic, but that hardly counts as a strike against it!

Without meaning any disrespect for other projects such as Mizar, QED, Mathweb, NuPrl, etc., for the stuff I'm interested in, HOL and Metamath are definitely the two systems of interest. They're both simple, they're both founded on mature work in mathematical logic, and there's enough mathematics done in both to make comparisons useful.

Unfortunately, the two systems are different enough to pose major compatibility problems. At heart, functions are different beasts. In ZF(C) set theory, they're essentially untyped, and allowed to be partial. They can be hugely infinite, but can't take themselves as arguments. Thus, even such a simple and familiar expression such as lambda x.x can't be directly expressed as a ZF set.

HOL functions, on the other hand, are typed, but total within that type. Thus, the function lambda x. 1/x is problematic. Harrison's construction of the reals in HOL arbitrarily takes 1/0 = 0. The trick of polymorphic types lets you apply a function to itself, with just enough limitations to avoid the paradoxes. So lambda x.x is back in the club.

I get the feeling that there's been work on connecting the two flavors of formal system, but it seems technical and difficult to me. This is disappointing.

In any case, I feel this interest in logic burning itself out for now. I'm sure I'll come back to it later; it's been an interest of mine for a long time. I still feel that a Web-centered distributed proof repository could be a very interesting thing, and that the tools I'm comfortable with could be applied well. But what I do not need right now is another big project.

Ghostscript

The big DeviceN merge happened, and it's not too bad. There are still some regressions, but they're settling down. A big chunk of my recent work has been putting together a framework to test whether 6-color DeviceN rendering is consistent with plain CMYK. I'm becoming an even firmer believer in testing, and I think that graphics rendering systems are particularly suitable for automated testing.

Fitz

I'm craving a chunk of time to sit down and write Fitz design ideas, but with the Ghostscript 8.0 release process taking all my time, it's not been easy. Even so, some aspects of the design seem to be coming into relatively clear focus, even as details on things like font and text rendering seem elusive. I'm particularly excited about the design of the runtime discipline. In particular, it should be a lot easier to understand and integrate with other systems than the current Ghostscript runtime discipline. I don't see a lot of work on conscious design for runtimes; more commonly, designs inherit runtimes from somewhere else (language, library, predecessor). As such, I'm hoping that the runtime document will make interesting reading even for people not in the market for a graphics rendering library.

Quote-plus

jeremyw is correct, of course. Simply saying "permission to quote is granted" or something similar is almost as concise, and far clearer to most people.

Metamath

The second big problem with Metamath is the fact that it nails down specific constructions. It's important to note, however, that this is a characteristic of the set.mm database, not the Metamath framework itself.

The solution to this problem, I think, is to use "theory interpretation." Instead of saying, for example, "0 is the empty set, and x + 1 is x U {x}", you say, "for all sets N and functions +, -, *, /, whatever satisfying the usual axioms for arithmetic." Then, if you ever need to show the existence of such a thing, for example to use arithmetic as a temporary while proving something else, you can use the standard construction, or any construction for that matter.

The question is how much more awkward this will make proofs. Metamath's choice of a definite construction is good for overall simplicity. I guess there's really only one way to find out.

Quote-plus

It is considered good etiquette not to quote email without permission. However, these days, emails are often part of a broader discussion spanning blogs, web fora, and so on. It's increasingly easy to run afoul of this etiquette rule. Conversely, it seems awkward to say something like "permission granted to quote" when I respond to, say, a blog entry in an email.

Thus, I propose the two-character string "+ as a shorthand indicating that permission to quote, with attribution, is granted. Permission is also granted to integrate the email into any copylefted documentation (most copylefts do not require attribution).

Analogously, "- is a request that the email not be quoted, perhaps because it contains confidential or sensitive information. Lastly, when someone is known to be familiar with the convention, "? asks whether quoting is permitted.

Spread the meme! "+

Metamath

Bram pointed me to this thesis on implementing real numbers within HOL. I heartily recommend this thesis to people following this thread (if any). It's very interesting to compare to Metamath's construction of the reals. Unfortunately, these constructions are not compatible. One significant difference is that Metamath seems to support partial functions, so 1/0 is not in the range of the divide function, while HOL wants to have total functions within the type, so 1/0 must have some value (Harrison chooses 0 to simpify the details). As such, proofs from one probably can't be easily ported to the other without serious handwork.

I feel I understand Metamath reasonably well now. It has some issues, but it's overwhelming strength is that it's simple. For example, I believe that a fully function proof verifier could be done in about 300 lines of Python. I wonder how many lines of Python a corresponding verifier for HOL would be; I'd guess around an order of magnitude larger. That kind of difference has profound implications. Norm Megill is certainly to be commended for the "simplicity engineering" he's put into Metamath.

For the purpose of putting doing Web-distributed proofs, Metamath has a few shortcomings. I think they can be fixed, especially given the underlying simplicity. I'll talk about these problems and possible fixes over the next few days.

Definitions in Metamath have two closely related problems. Definitions are introduced exactly the same way as axioms. As such, it's far from obvious when a definition is "safe". For example, you could add definitions for the untyped lambda calculus, which would introduce the Russell set paradox. The second problem is that there is a single namespace for newly defined constants. You wouldn't be able to combine proofs from two different sources if they defined the same constant two different ways.

Here's my proposal to fix these problems. Choose a highly restricted subclass of definitions that is clearly safe. For example, you could say that any definition of the form "newconst x y z = A" or "newconst x y z <-> phi", with newconst not appearing in A or phi, is acceptable. I propose to introduce new syntax that clearly identifies such definitions. You could use existing syntax, so that such definitions become axioms, but can be checked easily, or you could have other syntax that sets the new constant apart from its "macro expansion". That's a style preference.

Now let's talk about namespaces. I have a strong preference for using hashes as global names, because (assuming the hash function is strong), you don't get collisions. As such, it should be possible to mix together arbitrary proofs without danger. Here's an outline proposal.

Take the definition axiom, and replace the newly defined constant with some token, say $_. Hash the result. That is the "global name". When you're developing proofs, you'll probably want a (more or less) human-readable "pet name", but this is actually irrelevant for verification. Here's an example in Metamath notation.

Here's Metamath's definition of the empty set

$( Declare the symbol for the empty or null set. $)
$c (/) $. $( null set $)

$( Extend class notation to include the empty set. $)
c0 $a class (/) $.

$( Designate x as a set variable for use within the null set definition. $)
$v x $.
$f set x $.

$( Define the empty set. $)
dfnul2 $a |- (/) = { x | -. x = x } $.

So here's what gets hashed:

$a class $_ $. $f set x $. $a |- $_ = { x | -. x = x } $.

Take the SHA-1 hash of this string. Then I propose that #274b1294a7d734a6e3badbf094190f46166159e4 can be used (as both a label and a constant, as these namespaces are independent) whenever the empty set is needed. A proof file would of course bind this string to a shorter name, such as (/). When importing a proof file from another, the binding would be local to the file. (Currently, Metamath has only a file include facility similar to C's preprocessor #include, but an import facility with better namespace management would be quite a straightforward addition, especially considering that Metamath already has ${ $} scoping syntax).

Obviously, there are some details to be worked out, particularly nailing down exactly what gets hashed, but I think the idea is sound.

Schooling

Alan's Mindstorms arrived a couple of days ago. These promise to be quite fun (and of course educational :). So far, he's settling into first grade very easily. We begin the half-homeschooling starting on Monday.

Even so, I get the sense that Max is going to be the one most into computers. He's learning the OS X interface impressively well. Last time we resumed the computer, a folder was highlighted, and he said, "it's clicked." Then, when I ran Stuffit to unpack a game demo, he noted the icon and said, "it's squishing it." He's also the one that said, "I love Alan's 'puter".

Spam

Paul Graham has a new piece on spam is different. He also includes my comment in response.

I have great hopes for word-based (or "Bayesian") scoring. Paul's arguments for why it will work seem convincing to me. In particular, keeping per-user word lists should help enormously with two big problems: the ability for spammers to "optimize" around a common scoring scheme; and differing opinions about what constitutes spam.

I still think there may be a role for trust, but it's also possible that scoring by itself will work so well that adding trust isn't really necessary. In any case, I'll do my best to write up my ideas for using trust to beat spam.

trust

Bram and I had a great discussion about anti-certs. He is exploring the space of adding anti-certs to the trust computation, but is finding it complex. Many of the proposals would seem to require vastly greater computation than the simple eigenvector and network flow models I've proposed.

The leading alternative to that seems to be a way to use anti-certs as input to a process which removes positive certs from the graph. If you disagree with the ratings of user X, it might be interesting to analyze the influence of X on ratings transmitted to you through the graph, then remove your local edges which carry the most influence from X. In general, this boils down to optimizing edge weights. Lastly, anti-certs don't have be about individual users (nodes in the graph). They can be about ratings you disagree with. You don't really have to know where the bogus ratings come from, as long as you know how to tune your local edges to minimize them.

As always, a big part of the challenge is presenting a sensible UI. I've made the Advogato UI for certs as simple as I know how, and the user community is supposed to be sophisticated, yet it seems that many people can't manage to do certs and ratings the way they're supposed to be. Bram's anti-cert UI is straightforward to implement. In addition to "Master, Journeyer, and Apprentice", you'd just add one or more negative categories.

Metamath

I have more insight into the lambda definition yesterday. I believe that the definition of (lambda x x) is correct - it should be equivalent to the identity relation defined in Metamath. However, it is quite definitely a class and not a set. As a result, trying to use it as a function argument just yields the empty set.

This is probably a reasonable way out of the logic thicket. The problem is that the untyped lambda calculus is subject to the Russell set paradox.

I was talking about this with Bram, and he called ZF set theory a "hack." I more or less agree, but playing with it in the context of Metamath has led me to appreciate how powerful a hack it is. With a small number of relatively simple axioms, it gets you a rich set of infinities, but avoids the particular ones that bring the whole formal system crumbling down. You get integers, reals, tuples, sequences (finite and infinite), transfinites, and functions from set to set. You don't get untyped lambda calculus. Overall, it's probably a good tradeoff.

Metamath

I believe that the formulation of lambda I was wondering about yesterday is:

$d y A $.
df-lambda $a |- ( lambda x A ) = { <. x , y >. | y = A } $.

This lets me prove such things as ( ( lambda x x ) ` y ) = y. I'm probably not at the full power of the lambda calculus, though, because here "y" is required to be a set, while a lambda term is a class. So it's not clear I can apply this lambda term to itself.

School

Tomorrow is Alan's first day of 1st grade. It promises to be exciting.

264 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!