Older blog entries for raph (starting at number 250)


I got some very nice email from Konrad Slind and Ken Friis Larsen about how to make portable proofs, and about HOL. HOL stands for Higher Order Logic, which basically means you can quantify over functions and predicates. You can't do this straightforwardly in first-order logics.

Given this power, you can make nice abstract proofs that don't depend on any particular model. Basically, your theorem is of the form "for all systems of numbers that satisfying these axioms: ...". If you want to apply the theorem to a specific model, you just plug it in, and prove each of the axioms as a theorem. The general technique is called "theory interpretation". In its most general form, it's a transformation of abstract proofs into concrete proofs expressed in terms of a particular model, but in a nice system you can do the deduction directly, by taking one instance of the (polymorphic) universal quantification. Theory interpretation is one of the ideas in QED.

HOL is based on ML, which is really a very nice language for this sort of thing. I'm not yet convinced that the whole business of "tactics" and "tacticals" are essential in a proof interchange format, but they do seem to be a good way to partially automate the process of producing proofs.

Document formats

jfleck took my bait. Ok, I will explain why I think structural markup is not always the best approach.

In the simplest model, an author creates a document in a document format, then transmits the file (now a sequence of bytes) to a recipient. The recipient is in possession of a "viewer", which is a transform from the document format into a human-readable presentation. The usual viewer displays the document on the screen or prints it out on the printer, but it doesn't have to be that way. In particular, blind people will generally want the text read aloud.

So now we have the issue of variability of the viewer transform. Some document formats (PostScript, say) nail down the presentation quite precisely, while others (HTML, say) might appear very different in different viewers. Which is better?

Many people take a religious position on this issue. But I prefer to look at it in the framework of presentation quality. You have a viewing context: paper vs screen vs audio, window size (or screen size), color vs monochrome, etc. These are all legitimate sources of variability. You also have gratuitous variability, very commonly different sets of available fonts. You can solve this variability a number of ways: restricting the allowable fonts in a document, using freely licensed fonts, or allowing "embedding" of the font's glyphs (but not the complete font) in the document. The latter option is interesting because it restricts editability.

Now you can ask the question: over the space of viewing contexts, what is the quality of the presentation. As always, there are many aspects to quality: aesthetic beauty, ease of reading text, ease of finding something in the text, consistency, etc. The art of graphic design is all about making a high quality presentation. A typical rule of thumb is to place a graphic either on the same page or the facing page as the text which describes it. But the rules of thumb often conflict, and it's a judgement call to decide which ones are more important.

And now we can address the question of structural markup vs a document format that emphasizes presentation. Especially for documents that fit the "structured" model, the former can produce reasonably good presentations across a wide variety of viewing contexts. PostScript, by contrast, can represent a stunningly beautiful presentation, but only in a very narrow context.

There are, of course, many other factors that affect quality. Of popular document formats, HTML is particularly bad when printed. It's good enough on the screen, though. In fact, it could be better for screen viewing than PostScript or PDF, because it could use screen-optimized fonts, while the latter often forces printer-optimized fonts onto the screen. In practice, though, Web browsers haven't spent as much attention on rendering quality as, say, Adobe Acrobat, so the advantage is only potential.

Thus, I hope I've conveyed why I don't think that pure structural markup is always better than pure presentation, or vice versa. The main trend seems to be toward hybrid approaches. For example, PDF 1.4 allows you to specify one layout with great precision, but also includes parallel more-or-less structural markup so that it can reflow the text for small windows, etc. This adds considerable complexity, but one can make the argument that it serves readers of the document well. Similarly, the purity of structural markup is often bent, incorporating directives and hints with the intent of improving the presentation (style sheets are one such technique). However, these hints almost always fall short of pinning down the exact layout of text on the page, so you really can't do as well in print as a presentation-centric approach.


The discussion of document formats reminds me of a wishlist item I haven't mentioned here before: a batch converter from Microsoft Word format(s) directly to PostScript or PDF, faithfully reproducing the original layout. Obviously, wvware goes part of the way, but the print output goes through TeX, so the formatting gets pretty badly mangled.

Doing it as a batch converter is a lot easier than a full GUI word processor. That makes it realistic to focus on faithfulness. Over time, the existence of such a batch renderer would be helpful for building high quality GUI editors, but in the near term, a lot of the time people just want to view or print the documents anyway.

I'm pretty sure that we (Artifex) could sell such a tool, as it fits in with the Ghostscript business, and we've already seen some customer interest. So, if anyone out there is seriously interested in working on something like this, get in touch.

(also see wvWare2 design document)

Document formats

Peter Deutsch and I had a very interesting and wide-ranging discussion yesterday. One point I feel compelled to write about is the current state of document formats.

Peter did his presentation at Adobe in vanilla HTML. It worked reasonably well, but of course didn't have a very polished feel. The A/V techs found the choice unusual; presumably PDF and PowerPoint dominate.

What format should one prepare documents in? It's 2002, and there's still no standard, editable document format that produces reasonable quality printed output. Word, PDF, and HTML represent the entrenched two-out-of-three solutions, respectively. Are we ever going to get such a beast? It's possible that Word will become standard, PDF will become editable, or that HTML will become good, but the trends are not really in place.

It's also important to mention TeX. This format is actually pretty close on all three counts, but suffers from serious usability problems, and there are other issues; fonts other than the Metafont family are not standardized. The sad thing is that it was pretty close twenty years ago, but the work needed to make it broadly usable never happened.

The theoretical space of document formats is interesting. I think I'll write about it in coming days, including why the blind faith in structural markup is wrong.

rillian: I'm not really trying to argue that the AFPL is a superior license to the GPL. The question is whether AFPL licensed software is, literally, an unspeakable evil. I think there's a pretty strong case that it's not.

In any case, the GPL won and the AFPL lost. As a result, software is freer, but it's harder to make a living doing it. On balance, this is probably a good thing. Anyway, it's certainly not the case that the AFPL by itself would guarantee a good living wage to developers. In fact, the model of GPL'ed libraries + proprietary licensing seems to work as well as the AFPL, if not better. he discussions with RMS continue.

In any event, the discussions with RMS continue.


Now that the vast majority of "business models" for making money doing free software have been discredited, good ole fashion philanthropy is starting to look more like a viable option. You could even say that a good deal of free software has been financed this way - of the billion dollars or so the US spends subsidizing advanced computer science research, at least a percent gets embezzled towards free software development. In our world, that's good money.

I'll note that Advogato itself is supported in this way: the server is hosted at Berkeley's XCF. I'm not sure exactly who pays for the bandwidth, but it's probably some grant or other.

Recently, I've noticed quite a few software projects and services are begging for money: OPN, kuro5hin, LWN on the services side, and Dillo as one example of software.

I think this is actually a reasonable model. The money brought in from voluntary contributions is very small by proprietary standards, but maybe enough to keep a lean, mean free software project going.

The biggest problem with the model is the lack of connection between merit and funding. The main factors in allocating funds are fundraising ability and visibility. Again, I see this as just a fact, rather than something that needs fixing (or can be fixed).

But here's a thought I had today. How much money do you think Microsoft spends preventing free software? The roughly half-million "donated" to Peru almost certainly falls into this category. If there were an equal amount allocated among the most worthy projects, it would make a big difference.


I met with my advisor, Alex Aiken, today, and had dinner with Peter Deutsch (I'm staying in his guest cottage as I type). Tomorrow, I'll attend the seminar Peter is giving at Adobe, on the history of Ghostscript, and will probably meet lots of Adobe folks. Monday is our Artifex staff meeting. I don't consider myself much of a people person, but I'm enjoying the contact and the sense of connection. Even so, I'll be glad when things settle down and I can concentrate on coding again.

San Diego

Quaker meeting was good. The kids, especially, had a good time in San Diego, and (mostly) didn't get fried or too melted down.

In Pacific Yearly Meeting, I'm on Secretariat Committee, which basically means I put out the daily newsletter during meeting. This is a great way to get to know lots more people, especially for someone who feels as new and shy as I did when I started. Also, I like doing publishing.

I used LaTeX, mostly because I'm so familiar with it. Almost all the stuff coming in electronically was in Microsoft Word. I just converted everything to text (on the Windows box) and redid the formatting by hand.

It's always a humbling reminder to see how difficult computers are for people. There was a lot of tech support provided in the Secretariat office last week. A lot of the problems were things that should have just worked. One of the more frustrating was a brand new iBook running Mac OSX, which crashed every time you tried to add a new printer in the "Print Center". Eventually, someone looked through the console logs and found that it was an unresolved symbol in a Canon printer driver bundle. Since we didn't care about Canon printers at all (the printer was an Epson C60), deleting the bundle solved the problem. How the hell are people supposed to figure that out? It took three or four clueful people, and quite a bit of time.

Richard Stallman

Since OSCON was happening concurrently, I managed to meet a few free software people. Tuesday evening, Richard Stallman came by USD to talk to me. I'm still wrestling with the issues discussed. I deeply respect and admire Richard's vision and his principled approach, but there are also some things I simply do not agree with, and that is causing tension.

Essentially, the problem is that the GNU Ghostscript web pages and release mention the AFPL version, which goes against the GNU coding standards. According to those, it's ok to mention "well known" programs like Windows and OS X, but not to recommend them, or to mention less well known programs, such as AFPL Ghostscript.

I have two problems here. First, I don't believe in witholding information. Since Ghostscript is continually improving, the AFPL releases really are better than the GNU ones. In particular, if people are having problems, moving to the AFPL release might be a good way to solve them quickly. Also, in this age of Google, holding back on information is rather futile.

Second, I don't consider the AFPL version to be unethical. The primary freedom given by the GPL but not the AFPL is the right to sell the program for money without compensating the author. The other important rights, such as use, modification, and improvement without discrimination, are intact, as is gratis distribution.

It is intriguing to imagine how things might have turned out differently if the AFPL took hold, rather than the GPL. For one, it would address a big problem in free software, the lack of compensation for developers. Red Hat-style distributions would be at a disadvantage, because they'd have to find the authors and pay them, and Debian-style distros would probably have a wider role. I think that the community spirit of free software could work under the AFPL as well.

As it is, the question is basically moot, largely because the GPL community is huge, while there is not much of an AFPL community aside from Ghostscript. Both licenses are thorny and prickly when it comes to intermixing with other licenses, so it's really just as well that only one is popular. Otherwise, there's a pretty big risk of dividing the community. That looked like a real risk when the NPL came out, but again, the GPL won.

In any case, I promised Richard that I would fix up the current GNU release. However, I realize I don't really have my heart in being a GNU maintainer. There are a few possibilities. One is for some other volunteer to step up to the plate. Another is to amicably part ways with the FSF and do GPL, but not GNU, releases of Ghostscript in the future. This would be a shame, because I feel that GNU is a worthy project and needs support. Fortunately, there's no urgency; there's plenty of time to figure out the Right Way.

I tried to talk to Richard about the GNUstep project and its relation to Mac OSX, but he's not following the project. I find this quite disheartening. I feel that Richard was far more effective as a leader when he was actively designing and writing the software to make the GNU project real.

Ankh, simonstl, and Paul Prescod

I had considerably more fun chatting with Ankh and simonstl on Saturday evening, and meeting Paul and his wife Lilia at the San Diego Zoo with the family on Sunday. Our conversations were deep and wide-ranging. I am more and more convinced of the importance of personal connections. In particular, it's really important to have friends inside the W3C, especially because I've been so critical of the organization. In the past, DV has filled that role, but he's since gone to Red Hat. Now, I feel that if I really have something to say, Ankh will make sure it gets heard by the right people.

Quaker Meeting

I'm at Pacific Yearly Meeting. It's a great family time together, and I'm really enjoying building and renewing my ties to other Quakers in the area.

Advogato DNS

Sorry about the outage over the past couple of days. UC Berkeley changed the IP addresses, and, being on vacation, I managed to goof up the DNS updates. It should be fine now.

I think I want to figure out a better process for getting the updates.


Richard Stallman, in town for OSCON, stopped by a couple of evenings ago, largely to ask me to remove references to AFPL Ghostscript from the GNU webpage and in-release documentation. He has an amazing presence, and I want the GNU release to be consistent with his vision. I still have some concerns, though. For example, sharing the bug tracking between the GNU and AFPL versions has obvious advantages. Would that be an unacceptable compromise? I'm not a big fan of witholding information, but that seems to be what the new GNU coding guidelines seem to require, in order to avoid promoting non-free software.

I also enjoyed meeting Bradley Kuhn and Henri Poole (of the Affero project). I'll probably write more about the meeting; there was some interesting discussion.


My Fitz prototype is now released.

It renders the tiger, but otherwise is more of a code sketch. Perhaps some people will find the Python bindings interesting.

Well Tempered Screening

I sent a snapshot to our internal mailing list. It's getting to the point where it actually works and people might want to use it. A bit more cleanup, though.

I'm trying to wrap things up before my trip to San Diego, so I'm not wrapping my mind around more speculative things like proof systems right now.

San Diego

I won't be at OSCON, but as it turns out I'll be in San Diego from Monday through Saturday next week anyway, for Pacific Yearly Meeting of the Religious Society of Friends (Quakers).

I might be able to get away for a few hours. It would be really fun to meet with friends. I probably won't be in ready phone or email contact there, but get in touch and we'll see if we can set something up.


I think that computer-checked proofs are inevitable, but not particularly soon. None of the existing formal proof projects are Web-aware in the sense of fostering distributed collaboration. The big step, I think, will have to be taken by someone who understands the Web deeply, and enough formal logic to get by. I have the feeling that huge technical prowess is not necessary. Making a usable system doesn't have much to do with "corner cases" in logic.

As Bram points out, I think the big technical issue is making proofs robust with respect to choice of details. In existing systems, they seem to be fairly brittle. For example, Metamath uses the standard set-theoretic encoding of numbers. Proofs of theorems in number theory work no matter what encoding you choose, or even if you use an axiom system rather than an encoding. It shouldn't matter which axiom system you use either. In a sense, number theory is in terms of a "Platonic" ideal of numbers.

But you don't see that from looking at the proofs in the Metamath database. There, proofs are in terms of the specific encoding. Trying to adapt these proofs to a different encoding would probably be difficult, and would more likely than not involve digging into the proof itself.

For example, this is a theorem in the standard set-theoretic encoding of naturals: forall x and y, x > y <=> x = x \union y. However, in other encodings it will be false, and in others it will have no meaning. Somehow, you have to show that the proof is respecting the abstraction, not piercing it and accessing the encoding. Metamath, while brilliant in many other respects, doesn't seem to have an easy way of doing this.

In logic, "any theorem over naturals in one encoding is also a theorem in any other encoding" is known as a meta-theorem. Ordinarily, you can apply a theorem using plain old substitution (and, usually, some constraints that variables are disjoint). However, the rules for meta-theorems are hairier.

I'm convinced, though, that the understanding of the Web must run deep, or the project has little hope. In fact, there seem to be two fairly large-scale projects to formalize math in a machine-checkable form, both initiated by mathematicians, and neither with a high profile today. One, Mizar has nonfree software, and uses the structure of a refereed journal. Another, QED, doesn't look like it's been updated in a while. I haven't seen anyone even try to adapt a proof done in one system into another.

I should also mention NuPrl, which is GPL. The focus is a bit more on automated theorem proving, although they do seem to have nice library of proofs. Also, NuPrl is based on constructive logic.

A good blog

Phil Windley is the CIO for the State of Utah. His blog is well worth reading. We need more people like him in our government.


Antony Courtney points me to Twelf, a formal proof system from CMU. It looks interesting, although I haven't dug into it at all. For one, it's pretty difficult to figure out what license the software is released under.

Trustworthy metadata

I had a nice chat last night with Bram about the eigenvector metadata engine. He suggests that using medians rather than means as the way of combining ratings would be more attack resistant. I think he's right about that. With medians, if more than half of the confidence value comes from "good" nodes, then the result is guaranteed to be within the range of ratings published by "good" nodes. In particular, if "bad" nodes publish extreme high or extreme low ratings, then with a median they won't affect the outcome much, while with a mean they can.

Bram also reminded me of his Post from over a year ago. His design is actually fairly similar to the metadata engine as currently implemented in Advogato, but with some twists. For one, it may help with the scaling, because the confidence values go linearly down to 0 (at which point they need not be further propagated), while in the eigenvector formulation they tail off exponentially.

The analysis of his design is going to depend greatly on how the "distrust levels" are chosen. The subgraph of zero-distrust edges has no attack resistance among strongly connected components, so you probably want to avoid those. With distrust p, assertions travel over at most 1/p hops. If your distance to a "bad" node is greater than that, you only get recommendations from good nodes. That's not at all a bad property, although it's still a very good question how well it would work in practice.

It's really nice to have somebody to talk to who understands this stuff as deeply as Bram.

Expert witness

If you, or someone you know, is interested in being an expert witness in the area of database schema design, let me know. I was an expert witness for a halftone screening patent lawsuit a few years back, and I'm really glad I had the experience. I learned a lot about the way IP law works, and the pay can be pretty good too.


Alan's anxiety is much, much better now, and he seems overall happier.

Watching Max develop language ability continues to be a wonder. Typical sentences include "I take a picture at you, mommy" (holding a camera), "Put it on top your nose", and "I can't see his tail" (a cat in his favorite I Spy board book, with the tail indeed obscured). He's also starting to get into video games, including Galactic Patrol.


The HEAD branch has now been merged over the DeviceN branch, but the latter is not public yet. Well Tempered Screening works just fine after the merge, and I cleaned up some bugs. Among other things, the shading code was poking into the device halftone internals. Yucch.

What's the best primitive for formal math on the Web?

There are several systems of primitives that make sense as the basis for formal math. Metamath is based on ZFC set theory, but that's far from the only primitive.

In fact, under the hood, Metamath is actually based on generic axiom systems. You don't have to start from ZFC if you don't want to; you can put in your own axioms. But then you can't use the proofs already done. This is the old problem of "runtime incompatibility" again.

Another basis seems to be predicate calculus. A major justification is that boolean-valued functions are basically equivalent to sets. The "comprehension" axiom in ZFC set theory relates the two. Dijkstra's formulation of weakest precondition semantics favors predicates over sets as the most primitive operation.

Then, of course, you have the lambda calculus. This is, in many ways, more appealing because it's closer to real programs, and also has a more concrete flavor. The Church numerals are a particularly pretty way of encoding the natural numbers.

In fact, natural numbers alone are universal. You can encode anything you like as Diophantine equations. However, these encodings are really painful. They're useful as logic or philosophy, but not really useful to make real proofs.

A good proof language for the Web may have the TIMTOWDI philosophy, and include mutiple primitives. Having more than one is redundant, but may make proofs considerably easier. For example, in any of the systems it's easy to imagine having numbers so you don't have to prove 2+2=4 every time you need to calculate.

So the criterion for including a primitive should be whether it makes proofs easier, or whether it's just as easy to do the proofs using some other primitive. I'm not sure what all the relations are here.

Another primitive I like is tree constructors, just like ML datatypes. I think that these are easier to work with than many other primitives, but on the other hand each datatype basically creates its own set of axioms, which keeps you from having the same kind of purity as in Metamath. It might be a reasonable tradeoff, though.

One reason I like tree constructors is that they keep types cleanly separated. In the ZFC formalization of mathematics, you can say {0} + {0, 1}, and it makes sense (the answer is 3, by the way). I'd like the concept of integers and the concept of sets to be obviously disjoint.

Spam and trust

John Patrick writes: The Spam Has Got To Go. Amen to that.

But what's the solution? There are a lot of things you can try: legislation, informal sanctions, blacklists, spam filters, hashcash schemes, p2p spam reporting networks, etc. Many of these are being tried. Jon Udell and others suggest that digitally signing emails will cut down on spam, but it seems to me just one more barrier for spammers to overcome. Tap into an unlimited supply of digital id's, and you just send signed spam.

I think I know how to stop spam, while preserving the wonderful openness of email. Longtime readers won't be surprised to hear that my idea is based on attack-resistant trust metrics. In fact, longtime readers are probably sick and tired of me harping on this damn trust stuff all the time. But harp I will, for the message needs to get out there. There is now research on how to build spam-resistant messaging networks. Quite a bit more work is needed to make it practical, but I think the usual technique of paying smart people to think about it has a good chance of working.

Anyone who's interested should read Ch. 7 of my thesis-in-progress.


I had a nice interchange today with Jacob Moorman of SourceForge, about my miffed feelings regarding the language on their peer certification page. Actually, the peer certifications are going away, but more important, I get a renewed sense of an open channel with the SourceForge guys. In fact, complaining in my diary rather than contacting them directly was unfair, and I apologize.

I really hope SourceForge manages to succeed in spite of the current harsh business climate. They're doing an immense public service to the cause of free software.

A New Kind of Science

I power-skimmed the book again tonight. There's no question about Wolfram's ego. There's precious little actual science in the book, but I'm still convinced that there's something there. Exactly what, I'm not quite sure. My latest stab at expressing his central point is: "a lot of shit is universal".

Even so, I'm not really swayed by his arguments. One of his points seems to be that people thought about universal computation ability as the exclusive province of very complex, specialized equipment, while in fact, many simple systems can express universal computation. Sure, sure. But his Turing machine construction on rule 110 requires enormously complex and specialized software (the initial state), even if the hardware is ridiculously simple compared with previous universal constructions. If you look at the sum complexity of hardware and software, then there's a good argument to be made that Wolfram has taken a big step backwards.

Distributed proof repositories

AaronSw writes: "While hashes are useful, one thing you often here in W3C circles is that local file storage should keep track of the URIs from which it got the files and cache them so. So you'd just look for the cached file with that URL instead of the one with that hash."

Glad to see the idea is resonating for other people. I'm not sure what UR? scheme is best for these proof files, but my hunch is that hashes will be important. For one, definitions, theorems, and proofs are probably a lot more immutable than most of the other objects W3C people deal with. I'm also envisioning a Napster-style propagation of referenced objects. In typical usage, I can imagine a person working on a proof in conjunction with a client which snarfs proofs and lemmas from the rest of the Web, then, when it's done, uploads everything to a static web space. Thus, URL's move around fluidly, but the hash is constant.

I think I'll write about choice of primitives over the coming days, unless people beg me to stop.

241 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!