Older blog entries for raph (starting at number 284)

BitTorrrent

As Aaron Swartz suggests, I downloaded Red Hat 8.0 using BitTorrent. From casper, the performance was amazingly good: around 800-1000 kB/s. I've long felt that free software ISO's were a killer app for this code. Unfortunately, it seems not to be in anybody's business interest to push this angle, because the relative convenience factor of getting a boxed CD is pretty much gone if you can get full-bandwidth downloads.

Incidentally, the GUI version crashed randomly on me, but the command line version behaved fine.

Red Hat 8.0

I like it. I like it a lot.

I installed it on a spare partition on spectre, just to give it a whirl. This time, I kept a log of usability issues and bugs. I ran into quite a few, but I think overall the system hangs togther pretty well. Obviously, people put a lot of work into making the desktop a good experience as a whole. This is rare and highly welcome in the oft-fragmented world of free software.

I was pleased that it detected (almost) all my hardware correctly, including the Audigy soundcard and the dual CPU's. I'll have to add the (serial) Wacom pad by hand, but that's not bad at all.

There's a good chance I'll put this on my laptop too.

GnomeMeeting

One of the perks of RH 8.0 is that installing GnomeMeeting 0.93.1 was near-trivial. As I understand, this is pretty difficult software to get running from scratch. I wasn't looking forward to herding a Gnome 2 development environment onto my machine. To have it Just Work is a nice treat.

I'm not sure the rest of our team will be able to use GnomeMeeting effectively just yet. Among other things, the H.323 nonsense prevents it from working behind a vanilla firewall. That's a problem TeamSpeak seems to have solved by using a nonstandard protocol.

I went to #gnomemeeting on irc.gnome.org and got personal help from Damian Sandras, principal author of the program. He was very nice, and of course this level of community involvement is exactly the reason why free software is so good. I want GnomeMeeting (or at least some free VoIP solution) to win.

ZFC vs HOL

I got a very nice email from Robert Solovay in response to my last entry. He patiently explained to me that you can prove the consistency of HOL using ZFC theory, which means that ZFC is strictly more powerful. Gödel's second completeness theorem means that no system can prove its own consistency. Thanks.

A lot of this stuff still seems fairly murky to me, but I feel the waters clearing. I had some minor epiphanies over the past few days. For one, I now see a clean separation between two parts of making a Web-friendly mathematical system. One part is the translation between various proof systems, so as not to limit people too much. The other part is to arrange the inputs, outputs, and immediate files to make it maximally easy to share over the Web. These are quite separate: it's not hard to imagine proof format translators still stuck in the teletype interaction mode characteristic to HOL, or a Web-friendly system based on any one of the major logical systems. The latter seems easier to me, actually. I'm still a novice at the heavy mathematics behind logic and proofs, but I feel I know a lot about the Web and about how people can collaborate over the Internet.

The other insight has to do with the issue of making "conservative" proofs designed for portability. I think I can now answer the question: what are the natural numbers?

Well, what are the natural numbers?

This may be a bit sketchy, because the ideas still feel a bit raw.

At heart, the natural numbers are the Peano axioms. But just stating the axioms isn't quite enough to make them useful in a proof system. Among other things, it's dangerous to just add axioms lest they introduce inconsistency.

So, you build a construction of the naturals using native primitives of your proof system. For each axiom, you prove a theorem. In the Metamath approach, you then do everything in terms of this construction.

But what HOL does for type definitions is better. Essentially, by making a construction, you're proving that the axioms are consistent with the primitive logic. It doesn't really matter which construction you use, as long as you end up with the same axioms.

In fact, what I just said is a key meta-result. The usual interpretation of proofs in these formal systems is that they prove a single statement. But here, an abstract proof using only the Peano axioms can easily be transformed into a concrete proof, just by substituting in the actual construction.

Meta-logic in general makes me uncomfortable, but in this case I think it's what's needed to address deep problems of portability. In particular, I think it solves the problem of multiple different axiomatizations of the same basic concept. Your "construction" can go between two equivalent axiom sets just as easily as to a construction in your primitives. You keep your proofs abstract, but at the end you check them in some specific formal framework: ZFC if all you care about is the truth of the proposition, and "smaller" systems such as HOL if you want to check portability.

I don't think I've communicated the idea well. It's also a bit half-baked. I've only talked about naturals, but there are other things, especially functions, that are harder to treat this way. But I do think I'm on to something here.

VoIP

The quest for a good VoIP solution continues. One step which may be considered either encouraging or discouraging is TeamSpeak. Most importantly, it's not free software, but it does seem to work well enough to use. It comes from the gamer community rather than any existing VoIP constituency, which leads to some funky terminology. When someone joins a channel, a sci-fi female voice announces, "new player." On the plus side, there's an emphasis on performance and usability (especially with NAT's, which basically Just Work).

I do want to try GnomeMeeting again, but so far it hasn't worked for me. One of the most encouraging things about GM is that the next version will support the new Speex codec, which basically does for voice what Vorbis does for recorded audio. I took the standalone coder for a spin and was very pleased.

Proof-carrying code and code-carrying proofs

Michael Norrish sent me a nice email on my recent entry on proofs and code. The question of whether the proof checker should include a virtual machine, so that proofs can include programs is in some ways the dual of proof carrying code (and bib from 2000), in which programs come in "fat binaries" with proofs attached. A runtime can then check the proof, and execute the code with some assurances, primarily safety. PCC obviously needs a good, clean, efficient proof format, and equally obviously motivates a sharp separation between proof generation and proof checking. It's worth following.

Code-carrying proofs are by no means a new idea. In fact, Coq already implements them. It's not clear to me how deeply they're actually used. Among other things, you'd think they'd be useful for avoiding size blowup when importing theorems from other systems, but at least Freek didn't seem to use this in his HOL to Coq translator (Encoding the HOL Light Logic in Coq).

Translations between proof frameworks

When you look for it, there's been quite a bit of work done on translating from one proof framework to another. HOL has been translated into both Coq (and see another prototype) and NuPrl. These are not unique, although as it happens such translations can be quite tricky. For one, logical systems really do differ in expressive power. This is a bit unintuitive for those of us who grew up on the Church-Turing thesis, which effectively states that all programming languages have the same expressive power, because you can always write a translator from one to another. But logical systems differ in, for example, the number of ordinals. A proof that so-and-so-many ordinals exist can be ported to a "larger" system, but not to a "smaller" one. I can easily imagine a network of such formal systems, but I do not know how to draw the arrows. For example, I'm pretty sure that you can represent HOL in ZFC set theory, but my hunch is that the reverse is not possible.

However, I think a lower-tech solution may well be practical. Not all proofs make use of the full power of the logical framework. If a specific proof only uses the subset common to both frameworks, it should probably be portable.

I think this issue is related to the way Metamath nails down constructions. Since other systems naturally choose different constructions, you clearly can't just port proofs over. In Metamath, you can add sets and integers and get meaningful results. Obviously, porting any such proof to a different system will fail. However, if a specific proof uses only the integer axioms, and doesn't refer to the construction directly, it should be portable.

The way forward seems to be a significant portable library of the basic constructions: integers, reals, pairs, sequences, trees, etc. These basic constructions should be done in the most natural way in the framework. Then, when importing proofs, you hook up these constructions. There are some subtle details, mostly around partiality. What exactly is 1/0?

Proof efficiency

Another concern Michael raised is the relative inefficiency of a translated proof. Especially in the HOL framework, small proofs in the "source language" (which is really ML with a nice library) expand out to a large number of primitive inference steps. In the usual HOL scenario, the inference steps go very fast. If you're writing them out to a file, then sending it over the net to a different system, you might start caring about how many steps there are.

I'm not very concerned about this, for a number of reasons. For one, I believe it should be possible to greatly optimize the resulting proofs. For two, storage and bandwidth is relatively cheap and is getting cheaper. For three, a special-case data compressor is likely to be very effective, if one is needed.

But it's not clear one is needed. Both Automath and Metamath have significant libraries of theorems, and both can be checked on modern hardware in about a second. To me, this suggests a huge amount of safety margin between the optimal and the acceptible.

The Ghostscript codefest is over now. It went well, I think. We fixed a bunch of bugs, came to understand issues in such areas as embedded controllers for color laser printers a lot better, had a lively discussion about Fitz, and spent a little time hanging out and getting to know each other. I'm convinced that this is quite important, especially when the developers are geographically dispersed.

Billy

I got a number of responses on the Billy bookcases, generally expressing disbelief that they really were sold with such a basic defect as the holes drilled in the wrong place. No, they really were. rillian and jack both looked at them, and found pretty direct evidence - for the bolts that hold the thing together, there are two holes; one thinner and drilled all the way through, and one shallow but with greater diameter for insetting the head of the bolt. On correct pieces, the centers align exactly. On the two defective ones, they're misaligned by about a half inch.

I seem to have a knack for getting defective products. Perhaps it's just because it bothers me more than most other people.

Microkernels

I didn't mean to imply that microkernels are bad. In fact, there are definitely some good kernels based on the microkernel idea out there (QNX comes to mind), and a number of even more exciting research projects either based directly on microkernels or inspired by them (User-mode Linux, exokernel and eros come to mind).

What I was criticizing is the belief that, other things being more or less equal, a microkernel will turn out better than a comparable monolithic kernel. Experience now indicates otherwise. In particular, it seems you have to work quite a bit harder to get top-flight performance from a microkernel design. And, the really good ideas from microkernels (modules) have been ported to the monolithic style.

Beliefs of the form "X is a good way to build a system of type Y" are not easy to prove or disprove, because there are so many other variables that affect the quality of the result. If the question is important, the way to settle it is to build many systems using both X and non-X, and see if you can pick out a quality trend. If possible, though, it's better to focus on more crisply testable beliefs.

I'll give an example of one of those: a dynamic specializer is an effective way to get good performance out of highly dynamic interpreted languages. In particular, I was skeptical about Psyco, but at the rate it appears to be going, it looks like it will prove this belief. I hope this demonstrates that "show me the code" is useful for overcoming unwarranted skepticism as well as filtering out junk ideas.

28 Sep 2002 (updated 28 Sep 2002 at 08:30 UTC) »

A little light on technical content today, largely because I've been preparing to host a weekend codefest for Ghostscript developers. I want to study Bram's recent thoughts on trust metrics, and respond to an email from Michael Norrish on "code-carrying proofs" vs "proof-carrying code". Ah well, later.

Show me the code

rlevin wonders whether "show me the code" is perhaps too arrogant a stance. I don't find arrogance particularly appealing, but I think there is a good reason why this tenet is central to the free software ethic.

The problem is that the world of computing is filled with many, many opinions about things. Hard sciences are driven more by data and falsifiable experiment, but in the sciences with the word "science" in the name, it's still mostly opinion. However, there are good and bad opinions. Many years from now, when we look back, some will look sensible, but most won't. But now, I believe, critical thinkers among us can start to separate the wheat from the chaff.

The hard sciences, of course, evolved from proto-sciences dominated by opinion. Over many years, they developed a culture organized around propagating the good ideas more efficiently than the bad ones. Central to this culture is the idea of experiment. In particular, if a particular opinion can be tested by doing an experiment, it's a lot more interesting. Philosophers used to spend a fair amount of time debating how many angels can fit on the head of a pin. It's clear to us now that this is a matter of opinion. You cannot do the experiment.

As an example of an opinion of an entirely different sort, many chemists in the early 18th century were of the opinion that the relationship between a calx (such as rust) and its corresponding metal (iron) was the addition of a substance, some kind of "inflammable earth". Others believed that it was rather the subtraction of a substance that formed a metal from its calx. What decided the matter was a series of quantitative experiments carefully designed to settle the matter. Today, of course, we know the answer.

In computing, many opinions are of the form "method X is a better way to build systems of type Y than its competitors". I can choose from hundreds of examples, but I'll choose one fairly well known, not too controversial, but with some bite left in it. Microkernels, so believe some people, are a vastly superior way to build operating systems than monolithic kernels. Ten years ago in academic circles, such an opinion would have been considered entirely mainstream. Indeed, the arguments were quite persuasive: elegance, modularity, robustness. Who but a reactionary would disagree that they are the wave of the future?

Well, you can see where I'm going with this. The way to settle the question is to "show me the code". Today, the best kernels for general-purpose use are BSD (still) and Linux. Comparing, say, Darwin and Linux head-to-head is virtually no contest. Not only does Linux eat Darwin for lunch performance-wise, but it's also quite a bit more stable in actual use. Ordinary users see OSX kernel panics commonly, whereas in Linux-land, unless you're a kernel developer they're a pretty reliable sign your hardware has died.

There are still a few crazy holdouts who hold the opinion that microkernels are better. Perhaps they're right, and the superior track record of monolithic kernels is due to accidental factors, such as Linus's superhuman code chops. If so, I'll be convinced when they show me the code. After all, the early experiments to try to settle the phlogiston vs oxygen debate had many flaws, and trying to draw a conclusion from them may have been misleading.

But opinions are damn cheap. Any idiot can have them, and many do. Even smart people are tempted by pretty ideas. We're surrounded by a flood of wrongheaded opinions. The best way to convince me that your idea is one of the few good ones is to show me the code.

Yes, this attitude makes life harder for those with good opinions, but lack time or resources to write code themselves. Lord knows, I've felt the sting of that lash myself with my ideas on trust metrics. If I can't spare the time to code up at least a prototype myself, how can I expect other people to invest their time into my ideas? Since all my ideas are brilliant, it would of course be a lot more efficient if others would just take them up without me having to waste my time on coding, but in the off chance that one of my ideas isn't quite as good as I believed, for other people to take the attitude of "show me the code" might, I admit, save quite a bit of trouble.

Aeron

I bought a couple of Aeron chairs for the codefest, and so far I love mine. It's still not easy to come to grips with spending the money, but dammit, I earned the money from my own work, and if I should choose to spend it on something I'll enjoy, I shouldn't have to justify it in terms of higher productivity because of better ergonomics or whatnot. But, of course, it's easier if I do.

Ikea

I got a bunch of Billy bookcases from Ikea a few months ago, and tried to put the last four together yesterday, in preparation for the codefest. But two of them had the holes drilled in the wrong place, and I can't put them together properly.

I suppose I should try to fix the problem, either by getting a replacement from Ikea (not particularly easy), or by hacking (drilling new holes, filling with epoxy). I don't really want to invest my time and energy either way, though, particularly because the material is cheap. Particle board is in many ways the worst construction material very heavy, and no strength. What I'll probably end up doing is slapping some nails into them and ending up with improperly built bookcases, really more suitable for storage of old papers than showcasing my collection of books, almost all of which are meaningful to me and some of which are valuable. I won't be happy with them in my studio, but they'll do.

FAQ's about thresholding

What happened? I implemented a very simple form of thresholding for the recentlog. The default link off the front page is now recentlog.html?thresh=3. This means that any diary entry rated under 3 will be suppressed.

I don't want filtering. Two choices: don't be logged in when you read, or just use recentlog.html without the threshold parameter.

Can I use a different threshold? Sure, but you'll have to construct the URL yourself for now.

Why are the ratings in gray? The color represents confidence. The paler, the lower the confidence. If you put in a rating yourself, it will have the highest confidence and be rendered as black.

User X doesn't deserve to be filtered out. I agree. Rate her diary at 3 or above.

User X deserves to be filtered out, but isn't. I agree. Rate his diary below 3.

I want X. We're taking patches.

Viewsonic QUXGA

Viewsonic has their VP2290b monitor out for $7k street. It looks to be nearly identical to the IBM T221, but at a somewhat better price. It's very exciting that there are now two suppliers for 200 dpi monitors, as it suggests that the price will soon be reasonable. I want one.

LWN

Subscribe. I did.

On chemistry, alchemy, and computer science

I've been exploring the analogy between the development of chemstry from alchemy and the current state of computer science. Here's an interesting quote I found from Chemistry: A History by James R. Fromm:

It is easy to see, then, that more and more facts of chemical changes were being discovered in these ancient times, but the explanations for these facts could not be given. Chemistry and the chemical arts were developing; chemical theory was not. But the dishonesty that was being practiced under the name of chemistry was giving the science a bad name. Today we distinguish the good from the fraudulent, as developed in these centuries, by calling the first chemistry, the other alchemy.

I was struck by this. We call our discipline "computer science", but in practice much of the field is characterized by a dishonesty that makes forgery of precious gems from glass look quite tame by comparison. Of the subfields of computer science, security is probably the worst (even as modern crypto, based on computational complexity theory, shows promise of being one of the most rigorous areas). In context of talking about the way "digital rights management" platforms are currently being marketed as a more trustworthy system for users, Zooko said:

They are selling real poison, real enslavement, as a tonic.

Amen.

On the relation between proofs and code

Much of the time when I'm looking at proof systems, I feel like a novice. There's so much highly technical stuff, and it's not easy for me to figure out which of it is even worth learning.

Even so, occasionally I see clear glimpses. Sometimes, these glimpses don't even seem to be written down anywhere else. That's what I'll try to communicate tonight.

Formal logical systems

Here's a very simplified summary of formal logical systems. I strongly encourage you to get familiar with a real system, but if not, this will have to do.

Each such system has a language in which you can make statements. In a proof checker, you want to accept a lot of these statements (corresponding to true mathematical statements), but reject all those that correspond to false mathematical statements. Of course, there will always be plenty of statements which you can neither prove nor disprove, but we won't worry about those here.

In any case, most formal systems work like this: there is a small number of axioms considered true without needing proof, and then a small number of inference rules, which take one or more existing statements considered true, and spit out a new one. A particularly important inference rule is the "modus ponens". If you've proved A and proved that A implies B, then the modus ponens gives you a proof of B. In many systems, the modus ponens is what gives proofs their branching tree structure, and is also the best way to take "large leaps", since the inference rules themselves tend to be quite primitive.

There's a lot of detail we'll skip over here. I'll mention the fact that many axioms are actually axiom schema, and the fact that you have to watch out for free vs. bound variables (or, in Metamath, distinct variable conditions).

The de Bruijn criterion

At least, the outline above is true for systems satisfying the "de Bruijn criterion", which Freek Wiedijk defines as having the correctness of the system as a whole depends on the correctness of a very small "kernel". There are also a bunch of other systems that do more stuff in the proof steps, such as computations and so on.

Failure to satisfy the de Bruijn criterion has a number of disadvantages. Many people talk about the philosophical issue of knowing whether your results are really correct. Instead, I'll stress the issue of translating proofs from one system to another. If your steps are kind of big and ad hoc, it's much harder to know how to do this.

You can see why people want to relax a bit from purely formal rigor, but in fact systems satisfying the de Bruijn principle tend to admit proofs just as concise as those that don't. See Freek's "Comparing Mathematical Provers" for evidence of this claim.

So I'm going to assume that the de Bruijn principle is a good idea. In a larger ecology of proof systems, those that don't satisfy it will need translation into one that does, but not the converse.

The Poincare principle

On the other hand, doing things like numeric calculations by hand is quite tedious. Freek considers systems that can prove the correctness of calculations automatically as satisfying the "Poincare principle".

This principle raises the question of exactly what kind of software you've got for doing these proofs. You can do an extremely simple program that just checks proofs. Such a program won't satisfy the Poincare principle. So-called "interactive theorem provers" such as HOL have lots of subroutines that crank through a potentially large numbers of primitive inference rules. These systems are inevitably more complex, but do satisfy the Poincare principle.

Even so, automatic calculation, while very good for saving time when trying to make proofs, is not good for clarity and transparency. In particular, it's very difficult to know when these subroutines ("tactics" in HOL-speak) will succeed and when they'll fail. In particular, a lot of the things you want to automate are undecidable in general.

Automation also increases the risk of "version skew". Assuming that the technology is steadily improving, then you can't use an older system to check a proof script made with a newer system, even if it's perfectly capable of following the primitive inference steps generated by the latter.

This version skew is just one aspect of proof scripts being bound too tightly to short-lived software, rather than based in eternal mathematical truth. In particular, the way scripts are done in HOL seem especially hackish to me, with a combination of commands which modify the global state, output of half-chewed proofs as ML programs in ASCII format, and interactions with a make system that, among other things, leaves a lot of x86 machine code lying around. It seems to work reasonably well in practice, but it can't possibly be the Right Way.

The way out

The way out seems obvious to me: separate the proof generation and proof checking tools. The checker can be very small and stable, while proof generator can ride the feature treadmill just like most other software.

Further, you can have a diversity of tools for proof generation. Some might look just like HOL. Others might have a GUI, or perhaps Web, front-end. You might well do small, specialized tools for generating proofs in particular domains.

The interface between the tools is a proof file format, strongly analogous to object code. Most people think of object code as an unpleasant but necessary step in the process to get programs executed efficiently, but I would rather like to sing its praises. In particular, object code is key in binding together components written in different programming langauges. I can easily see the same thing happening in proofs.

In my opinion, one of the strongest advantages is that you'd be able to explore new "source" languages for writing proofs, and still be able to bind with existing results. There's a lot of stuff that's been proved in mechanically-checkable form. Having to redo it all every time you want to play with a new proof language would be a crippling limitation.

Notes on computational complexity

The relationship between code and proof can now be seen as one of straightforward engineering tradeoffs, rather than a deeply philosophical inquiry into the nature of proof.

A big engineering question is: how long do these calculations take when done in a proof system. Harrison's thesis discusses this question fairly deeply. He put some engineering work into efficient representations (using bit sequences rather than iterated succession for the naturals, for example), so that computations would be reasonably efficient. Even so, they're enough slower so as not to be competitive with, say, a good bignum library.

In HOL, the cost of a primitive inference step is a certain amount of CPU time. In the system I'm talking about, it's also a certain amount of space on disk for the "proof object" file, and, probably more interestingly, bandwidth when you're moving these things around the Web.

Given that a good bignum library is a few orders of magnitude faster than doing calculation in an HOL system, there's an overwhelming temptation to just call the bignum library instead when you need calculations. That's how the de Bruijn criterion gets violated :)

But we can look at the question: just how much are we losing by forcing everything through primitive inference rules. Let's say you have a program which does calculations, and also a proof that the program gives the correct answers. How long does it take to "execute" each step in the program? In the case of HOL, there's an upper-bound answer of about eight orders of magnitude (based on the ARM model included in HOL). Of course, in practice you'll usually be able to simplify the proof a lot.

Similarly, the question of file size is deeply related to compressibility. Kolmogorov complexity is the obvious relevant theory here.

So shall we run arbitrary code to check proofs?

Given the obvious usefulness of code, it's tempting to put a virtual machine into the proof checker and proof object file format. A proof can include code when it also includes a proof that the code is correct. Such a proof will also imply that the code is safe, by the way. The checker will verify the proof of the code first, then assuming it passes, just run it. The output can be entered into the database of statements proved correct, just as if it had been spit out from an inference rule.

The consequence, I think, is a potentially significant performance boost, but not an increase in the logical power of the system. In all cases where you'd execute code, you could just interpret the virtual machine using inference rules instead.

It may be that this performance increase becomes worthwhile. Right now, it doesn't seem to be. In fact, proof checkers on modern machines seem to be almost ridiculously fast. Freek's Automath implementation checks the entire Grundlagen der Analysis in under a second (roughly double the time taken to interpret a single ARM instruction in HOL). Similarly, Metamath takes about a second to verify all of set.mm, including Hilbert spaces and other nontrivial results.

So I think code is not worth it until we start building a library of slow-running proofs, and that doing an efficient virtual machine once becomes a better investment than optimizing each of the proofs.

The Advogato virus

Whew, that was fun. We just had an ingenious virus implemented and deployed against this site. The hole is closed now, and apparently the only damage done was corrupting people's names.

The hole itself was very simple: strings entered in the "given name" and "surname" fields was passed through to the final rendered HTML with no quoting. The virus exploited this hole to embed "iframe" tags, which then caused a form submission changing the name fields and propagating the virus.

My guess is that a huge fraction of sites that take postings (web boards, guestbooks, etc.) have similar problems. I tried my best to code mod_virgule carefully, the code has been quite stable, and a number of people have been over it. There are just so many security gotchas that it's nearly inevitable you'll run across one or another.

Here's one that's gotten relatively little publicity: Netscape 4.X on Unix treats 0x8B and 0x9B the same as < and >. Thus, any site which posts user-submitted text and does not filter these characters is vulnerable to exactly the same kind of attack. Try it and see.

How would you make it secure?

There are two problems here. First, string quoting wasn't done properly. There is a systematic approach to the problem, but I haven't seen it used much (including the mod_virgule code): types. Specifically, you have one type for arbitrary strings, and other types for variously quoted strings (the web has a few such contexts, some of which are quite similar). The only way to cast between them is to do a quoting operation (or to verify that no quoting is needed).

But the deeper problem is that Web applications interact with Web browsers, which are capable of extremely rich and diverse behaviors (not entirely unlike execution of macros in documents, when it comes down to it). Reasoning about exactly what a browser might do in all possible contexts is difficult to the point of intractibility.

Do I think that Advogato is "secure" now? The odds are against it, even though I'd argue that the code quality and process are superior to most web applications out there. Yet, a lot of people seem to think that a Web browser is a good interface to do things like move around money.

(thanks to Bram for discussion on this topic)

Proofs

I got a nice note from Freek Wiedijk about proof systems (somebody had tipped him off that I was writing about them in my diary). Among other things, he pointed to mowgli, a math system explicitly designed to work on the Web. He and I agree that XML is not all that tasteful, although I do think that the Web emphasis is a good thing.

Freek is also a fan of Mizar. I agree, it's worth looking at. One of the things that turned me off initially is the lack of a free software implementation (the official implementation is almost free). Given the impressive library of proofs, and the other advantages of the system, including readability, I'd imagine that doing a new implementation would be a nice smallish project.

Presence and change notification

Taral brought up "presence" as an interesting problem in peer-to-peer networking, which doesn't seem to have gotten much attention. It's a concept from instant messaging, but I define it to mean notification of "join" and "leave" events.

Presence is a special case of the general problem of change notification. A robust, efficient technique for change notification would be incredibly useful. On the Web, most people do it by pinging, but with obviously poor latency and waste of bandwidth (and an unpleasant tradeoff between the two). For the most part, it works almost well enough, but people feel the limitations enough to propose crude new mechanisms such as RSS clouds.

One special challenge with presence is that the target node can't itself send a notification when it goes down. You have to either ping it yourself (with problems noted above) or rely on a witness. Then, you get into the dual problems of whether the witness is trustworthy, and handling cases where it suddenly leaves at the same time as the target ("backhoe failure").

My personal feeling is that in a robust change notification system, you have to deal with the reliability of proxies anyway, so presence is best implemented as one application of a general change notification mechanism. Taral feels, rather, that presence is a separate building block that's needed to implement other change notification services.

In any case, I think it's an excellent research project for somebody interested in peer-to-peer networking, in part because it's probably a lot easier than many p2p problems. After all, pinging does work, you just want better performance.

Obvious applications of change notification include Web caching, DNS propagation, and "ticker"-style news clients. I'm sure that you'd see a lot more applications if there were a good infrastructure available.

Max the Unsqueamish

Our compost bin went bad over the past couple of days, probably helped by the warmer than average weather. Today there were flies buzzing around, so we decided to dump the whole thing in the garbage. Of course, Max wanted to help.

When we opened it up, we were confronted with probably the grossest spectacle in our lives. The bin was crawling with maggots, and with a smell to match. Shit would be pleasantly aromatic by comparison. I offered to dump it out by myself, but Heather assured me she could do it herself. I took Max inside because I figured he did not need to see (and smell) this.

In any case, Heather asked me back outside for help shortly. Max absolutely insisted on coming outside too. He knew there was something happening and did not want to miss it. So we let him, and a few minutes later we had the worst of it dumped in the garbage bin.

Of course, the compost bins still needed hosing down. This is Max's domain, and he wasn't about to let anyone else encroach on it. So he happily sprayed them with the hose until they were reasonably clean. I went inside for a bit, at which point Heather asked whether I threw up. I didn't, but I can see why she asked. I had gagged a bit when we were dumping out the lower tray, entirely filled with maggots swimming in putrid water.

But the next part truly earns Max his title above. When we were done, we went back inside and Max asked for me to cook dinner. He said he was hungry, that he loves "dinner", and that it makes him grow big and big, so he can ride the bumper cars. This has been a goal of his ever since he went to Marine World and was denied riding in the bumper cars because he was too small, even though Alan got to ride. I got him a Pizza Lunchables from the fridge, which he ate heartily. Then Heather and I waited a bit more than an hour for our stomachs to settle down enough to eat.

There are times I doubt he's normal.

20 Sep 2002 (updated 20 Sep 2002 at 08:00 UTC) »
Proof freak

I came across some interesting stuff by Freek Wiedijk, especially his comparison of 15 notable theorem proving systems.

See his abandoned manifesto for more insight into why I'm so interested in this topic. I consider his analogy to the Web to be exact. Of course, he wants to do mathematics, but I want to do computer science that doesn't suck. It's a very good sign that our visions are so similar then.

Diary filtering

Sorry for not following up on the patch to add an ignore list. What I really want is filtering based on the ratings. That post outlines the grand scheme, but if we want to start by doing simple thresholds, I'll take the patch.

Kids

More evidence that Max is turning into a computer geek: the other night, he was talking in his sleep, and said "Apple-Q it, daddy". We're starting to seriously use the Web for homeschooling now. A couple of days ago, we found some clips of animal noises. Both kids thought the chimpanzee was funny.

Null desktop

Red Hat's new desktop attempts to create a coherent interface, abstracting the differences between KDE and Gnome in particular. This move seems to have gotten some criticism from the Slashdot crowd, but I think it's great.

Red Hat is trying to solve a real problem. There are a lot of geeks who are into fiddling with every detail of their systems, but for the rest of us, dealing with two different desktops is one big usability problem.

Ok, so it's a problem, but is Red Hat going about it in the right way? Absolutely. They're rolling up their sleeves, doing the work, and releasing code. The Gnome and KDE camps haven't made all that much progress by themselves, and if they tried to build some consortium or something, it would more likely than not take forever and go nowhere. The Red Hatters, I think, understand the true free software Way.

Owen's apology is a model for honesty and forthrightness. When's the last time you saw something like that from a proprietary software company?

Maybe Red Hat isn't doing this the best way. If so, feel free to do a better job. There are already way too many distributions. Maybe it's best to give people the option between Gnome and KDE, and only have things work smoothly if users choose applications written for that desktop. Maybe it's better for distros to just pick one, and abandon all the useful apps from the other camp. I like the Null approach, though, because it preserves real choice in developing apps.

Even more, I think that the popularity of the unified desktop will have a strong positive influence on both the Gnome and KDE camps, to put more thought and effort into making things run more smoothly in an integrated enviroment. A lot of people are going to be using Null.

Some people seem to be upset that Red Hat is creating their own brand for this desktop. As long as credit is given where due, I have no problem with it. In fact, if it helps them make money, so much the better.

I'll make sure there's a good Fitz widget for both KDE and Gnome. I've felt this way a long time. I'm happy that, for example, Abiword and ksvg are both using libart for their SVG stuff.

RSS Wars

There's a lot of controversy around different versions of RSS. To make things even worse, aaronsw posted a so-called RSS 3.0. It seems to be a mix of serious simplicity engineering and spec parody. If I were trying to emphasize the former, I wouldn't have called it RSS 3. Maybe RSS -1?

In any case, RSS is a perfect example of a poorly implemented spec. Real-world RSS parsers must be "liberal", because it's rare for feeds to strictly conform to the spec, or, for that matter, strictly conform to the XML spec. I see non-ASCII and XML metacharacters improperly quoted as often as not. The idea of writing a rigorously formal spec is a joke. It might be an interesting exercise, but in the real world you're forced to do workarounds and hacks.

Logic over the Web

I'm beginning to grok HOL. There's a pretty impressive library of stuff proved in the system, including Harrison's reals construction, Rijndael encryption, and the ARM architecture. It's obviously a cool system, but again I think it could be refactored a bit for the Web.

The Web "workflow" I'm envisioning doesn't correspond exactly to any formal proof system I've seen. I'm thinking of an app which uses proof files downloaded from the Web, interactively proves new things using those proofs, then creates new proof files which can be uploaded to the Web. The key question: what should be the format of the proof file?

The key goals for such a system are to reduce the barriers to using proofs developed by others (perhaps using different systems such as Metamath or HOL), and publishing your own proofs so they can be used by others (again, perhaps using different systems). Further, actually making proofs shouldn't be tedious. In particular, you should definitely not have to do easily decidable stuff like numeric calculation and parsing by hand.

With existing proof technology, you have to make some pretty harsh tradeoffs. In particular, you have to choose a primitive logic, such as ZF set theory or typed lambda calculus. Then, interchange with similar systems is (hopefully) easy, but painful to impossible for systems with different primitives.

This shouldn't be. If you're proving results in, say, number theory, why should you care whether they're constructed from ZF sets or lambda terms? The numbers are the numbers. The same thing goes for most objects of interest in computing, which tend to be finite or very nearly so. These things should be portable among proof systems.

What's the way out? I see a few possible paths, if vaguely. The simplest is to optimize for one direction of interchange. For example, one reasonable goal might be to develop a portable library of basic mathematical objects. You'd do proofs over this library using simple inference rules (axioms), then have multiple exports for the various proof systems. Each such export would probably use handcrafted constructions, which might be quite different. For reals, why not use Harrison's construction in HOL, and standard Dedekind cuts in Metamath?

All this interchange stuff is deeply related to proof longevity. Presumably, the mathematical truth embodied in the proof is timeless, but any actual encoding into a file format is likely to be as short-lived as the program which processes that file format. If proofs can be fluidly translated into new containers, perhaps they might have more reasonable lifetimes.

Photos

The CCSD Web site is up, and my family's smiling faces adorn the top. Second from left is Alan, Heather, and Max (original), and rightmost is Heather's father Richard with Alan (original).

The story of the first picture is kind of funny. It was a very sunny Christmas day, and we were having a family walk down Benicia's first street, with me taking along the camera and trying to get pictures of their new toys. They sat down on one of the benches, and I tried to take a picture. In order to provoke a smile, I started acting completely silly, dancing around, making noises, and in general making a complete fool of myself, in between snaps. Judging from the photo, they thought it was funny. Anyway, after I was done I noticed that there was an elderly lady who had watched the whole thing. She didn't want to walk in front of the camera and interrupt me. I felt more than a bit silly, but I consider the result worth it.

275 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!