Older blog entries for raph (starting at number 302)

Lawyer-free zone

Lawrence Lessig has an essay in favor of limiting copyright terms. I agree of course, as I'm sure most Advogatans do. What struck me is his use of the phrase "lawyer-free zone" to describe the public domain, with the clear implication that this is a good thing.

I think that he's identified a clear cultural distinction between lawyers and non-lawyers. Lawyers, having been steeped in the legal approach to resolving disputes, assume that it's the best approach, and can barely consider any others. The rest of us recognize the legal approach as being hideously expensive, unreliable, often quite unfair, and with significant potential to limit innovation and flexibility.

These problems with the "lawyer zone", as opposed to the "lawyer-free zone", are the reason why the current trend to increase the reach of copyrights, patents, and trademarks is a bad thing. The problem is, of course, that most lawmakers come from the lawyer culture.

I'm impressed with Prof. Lessig for both being a damn good lawyer (his command of the legal issues as shown in his oral argument in Eldred vs. Ashcroft is downright impressive) and for being able to see and express so clearly the advantages of sometimes not doing things the lawyer way.

Free software, of course, is a shining example of an almost entirely lawyer-free zone. I'm impressed with the mechanisms we have for resolving disputes. One key, I'm now convinced, is fork rights. At heart, the right to fork a project gives people an option in case the current maintainer isn't doing a good job, for whatever reason.

A particularly clear examples is the handling of the SourceForge codebase. As the OSDir interview with Tim Purdue makes clear, VA Software did a number of things in direct opposition to the interests of the free software community, not least of which is making promises to release the code without any intention of doing so. A more subtle, but no less important, issue is the deliberate rigging of the installation process to make it difficult.

In the lawyer zone, these types of things would generally get handled as some kind of contract litigation. There'd be a lot of "discovery" to try to find a smoking gun to prove that this bad behavior was intentional, and in the end neither side would really win anything (in stark contrast to the lawyers themselves).

Fortunately, in the free software world, we have a better way. We simply fork the code, fix the problems, and move on.

I have high hopes that the Creative Commons can bring the benefits of the lawyer-free zone to writing, music, art, and other creative domains that have been increasingly dominated by the culture and processes of lawyers. We'll see!

Google's new stuff

Google absolutely rocks, but I'm not anywhere nearly as impressed with their more recent offerings.

I took froogle for a spin, but wasn't all that impressed. For example, the search "pc2100 ddr 512mb" gives you prices from $109.99 to $289.95 on the first page, while pricewatch is $96-$99 for the first page of the category. Other queries gave similarly unimpressive results.

I've written about Google Answers fairly recently. Here's how I think they (or someone else) could make a much more compelling service.

Very simply, start with the existing service but provide means of crediting user accounts for answering questions, in addition to debiting them for asking. Each user can choose to be in the red (as now), in the black (in which case Google would PayPal the money), or at zero, which is probably how most people would want to use it.

You have some nontrivial trust issues, especially as you're paying money out, but I'm pretty confident they're tractable. Google already has ratings, a dispute process, and so on.

Paying money to get an answer is interesting. It's a way for people to demonstrate that they really care about the answer, but at the same time it feels a little odd. On the other hand, if I can get an answer in return for answering 1.25 other people's questions, that would be appealing.

In any case, I find I have relatively little use for such a service, because I'm already part of a fabulous network of knowledge seekers. This diary is a significant component of that network. For example, one test query I was considering is the Latin translation of our family motto, "happy, sticky, and green" (in the sense of a child who has just eaten a pistachio ice cream). Now that I've posted the query here, the chances are excellent somebody will send an answer. Thanks in advance!

Why machine verifiable proofs?

moshez asked why I'm so interested in making machine verifiable proofs. Freek Wiedijk explains it better than I can, from a mathematician's point of view.

But my main interest is not so much existing proofs of mathematics. What I really want to do is develop correct programs. The theorems needed for programming tend to be much too tedious for mathematicians to deal with. But if the programmer has the patience to generate such a proof, at least a mechanical checker can give pretty good confidence that the theorem is true, and thus that the program is correct.

I'm doing okay, although still feeling buried in work, and not fantastically productive. Thanks to everyone who wrote with concern, and sorry for not responding to your email.

I spoke with David McCusker quite extensively yesterday. It's possible that Fitz will prove a good fit with the goals of the OSAF, which would be fun. I think I'll be visiting them soon.

In any case, it's clear that a working prototype of Fitz is on the critical path to everything else, so I'm going to try to get that done next.


For fun, I proved the ordered pair theorem for (nonnegative) natural numbers in Metamath. It's a very low-level framework, so a lot of aspects are tedious, but overall I was pleased with how it turned out. This is actually my first experience proving something completely rigorously. I recommend it to others.

I've been having a great email discussion with Norm Megill and Robert Solovay, and also with Josh Purinton. Very cool.

I haven't felt very productive recently.

My mom is in town for a week and a half, and we're doing lots of fun family things. This evening, we went to the tree lighting ceremony in Benicia.

In memoriam, Judith Stronach

A member of our Quaker meeting, Judith Stronach, committed suicide last week. It's very sad. She was intensely committed to peace. She taught regularly in the meeting's First Day (Sunday) School, and frequently got the meeting's children involved in projects to help children in faraway war-torn places such as Cambodia, Iraq, and Afghanistan. They won't have her to make these places, and the people living in them, real for them.

Others who knew her better talked of an inner darkness. It's likely that she simply felt too weighed down by the sad and frightening direction our country is taking.

Life is such a gift. I'm sure many people around me are feeling down, and some are probably suffering. I hope I'm able to reach out and show caring and tenderness when it's needed most.

Proofs, or: Set theory for dummies

I think I've worked out a simple, yet powerful, definitional framework for Metamath. I had the basic idea a few weeks ago, but ran into a technical problem with capture of unbound variables (not surprising that this is where the pitfalls lay!). I think I've solved it now.

I'm trying to do these definitions by textual expansion. In Metamath proper, the usual way to create a defintion is to introduce a new axiom. For example, here's the definition of ordered pairs, following Kuratowski:

df-op $a |- <. A , B >. = { { A } , { A , B } } $.

There's no syntactic difference between a new definition and a fundamental axiom such as ax-inf (the Axiom of Infinity). However, definitions are generally of a form which makes it clear they're not adding any axiomatic power.

Even so, I think it's important to formally distinguish axioms from definitions, especially for Web-based proof checking. If you download a random proof off the Internet, you really want to know that the definitions are "safe" in some formal sense.

I'm going to gloss over syntactic issues a bit. In Metamath, definitions are intertwingled with the grammar for parsing terms. In general, a new definition also extends the grammar. The main problem with this is that two grammar extensions, while themselves unambiguous, may lead to ambiguity when combined. Also, it's very important that the textual expansion matches the parse. In C, it's possible to violate this with amusing results:

#define square(x) (x * x)
int main() {
  printf ("The square of 1 + 1 is %d.\n", square(1 + 1));

So I'm basically going to assume that we're working with terms in a pre-parsed format (like Lisp S-expressions).

Ok, so here's how it works. A definition looks like this:

double ( A ) == ( A + A )
The first token is the thing being defined. There's only one definition for each token. If you try to define the same token twice, it's an error.

Next comes a list of variables. Following the Metamath convention, there's a previous declaration of the "type" of each variable. In set.mm, A is of type "class", which is the right one for numbers.

The == symbol means it's a definition. I chose it to avoid conflict with =, which is plain ol' equality.

Finally, there's the expansion of the definition, which is pretty much as expected. One caveat: it can only refer to definitions that have come previously; circular definitions are not allowed.

Ok, now the rule for expanding. For this particular definition, you can safely expand it anywhere it occurs, but that won't be true in general. So we restrict expansion to a single, especially safe place: between an assertion and its proof. If the final step of the proof exactly matches the expanded version of the assertion, we accept (the unexpanded version of) the assertion as proved.

For example, let's say we wanted to prove the assertion: double ( 2 ) = 4. Its expansion is ( 2 + 2 ) = 4. As it happens, this is already proved in set.mm, so the body of the proof can be just 2p2e4. Since the result matches the expansion, we accept the orignal assertion as proved.

Now for the tricky bit. The right-hand-side of a definition may contain dummy variables. An example is: V == { x | x = x }. (This is the "universal class" containing all sets). In this particular case, introducing the dummy variable in an expansion is safe, but that's only because the variable x is bound. Metamath doesn't make a formal distinction between free and bound variables, largely because it's pretty tricky to get right. Instead, it uses "distinct variables".

We add two rules for dummy variables: the new dummy variables must not appear in the original, unexpanded assertion; and, if a definition is expanded more than once within a single assertion, all the dummy variables must be the same.

Let's say you had the degenerate definition: bad == x. If you let the dummy variable x appear in the original assertion, you'd be able to prove junk such as both "A. x x = bad" and its negation. The flip side is that it's no longer trivial to prove things such as: V = { x | x = x}. The body of the proof has to establish { x | x = x } = { y | y = y}. Fortunately, cbvabv takes the main burden of such a proof.

The second rule is more subtle. If you let them sometimes be the same, sometimes different, you could arrive at a contradiction, so it clearly has to be one or the other. I think it's safe to make them all distinct (and it seems more intuitive), but making them all the same preserves truth better. Take "bad = bad" for example. This is trivially provable (using cleqid) without doing the expansion, but if all dummies needed to be distinct, the expansion would be "x = y", which is not provable. I prefer the provability of terms containing definitions to be exactly the same as their fully expanded counterparts.

Apologies for the somewhat detailed technical nature of this post. I do believe, though, that this kind of definitional framework is much better suited to proof interchange on the Web than Metamath proper.


We had a nice Thanksgiving at Heather's dad and with her sister.

The day before, Heather pointed out the spectacular colors of the sunset to Alan, and he remarked, "that looks just like a mandrill's butt," completely deadpan.


The fine folks at Azoogle have figured out how to "publish" email past my Spamassassin setup. In addition, to quote from their web page, "By securing contracts with each of these major email companies [America Online, Yahoo, Hotmail, MSN] it guarantees your emails will get to your customers without being blocked, rerouted or suppressed."

They claim that their service is opt-in only. The major mail providers probably take them at their word. A test account at Yahoo doesn't currently show any from Azoogle, but plenty from, for example, email-publisher.com (which is actually Topica). This is the stuff that gets past their laughable Spamguard(TM), mind you.

This all suggests quite strongly that digital identity alone probably won't make that big a dent into spam, contrary to Jon Udell's assertion. Some spammers might have trouble getting a digital id from whoever's issuing them, but my guess is that the likes of Azoogle will have no trouble at all.

I also got an interesting 419 mail from one "Ade Blessing". This one was full of Jesus-speak, and mentions early on that the author was a Moslem before he converted to Christianity. I hate spam, but have mixed feelings about the 419 stuff. The victims, after all, have to be pretty dishonest to fall for it, and it's a net transfer of money from a rich country to a poorer one, much of which actually subsidizes Internet access for ordinary citizens. In this particular case, the victims are most likely anti-Islam religious bigots as well.

Spam only exists because it works. Sigh.


8.00 is out!

Other things

Obviously, I haven't felt like blogging much the past few days. I've had a bit of a sore throat, which ordinarily wouldn't be so bad, except that I'm pretty sure it makes my sleep breathing worse.

I have been hacking a little on the new Fitz code. That's fun.


Late but heartfelt congratulations to David McCusker for his new job at OSAF. Sounds like he'll have a lot of fun there and put his skills to good use.

Thanks to everybody who expressed concern over my last entry. Things aren't actually all that bad (and, in fact, show signs of getting better), but I wanted to write something rather than just keeping quiet.


The 7.33 beta is out, and will be at the usual download sites soon. If there are no real problems, 8.0 should be out within a couple of days.

Another thing that's soaking up my time is writing a paper on my Even Toned Screening for the Electronic Imaging '03 conference. I'm pleased with the way it's turning out, and will post a copy tomorrow.

Googlejuice and images

Heather and I have both seen an uptick in image traffic, mostly from our family website. We're also getting a trickle of requests to use the images. Looking at the server logs, it's obvious that all this new traffic is coming from Google Image search.

I think this is because of a combination of high Googlejuice and the fact that I tend to name pictures descriptively. We're on the first page for these search terms: pinata blue-whale max strawberry happy easter-egg lily, and pretty high up for nursing birth grampa penguin yawn. In fact, the picture of Max yawning can be found on numerous bulletin boards as the symbol of a boring topic :)

IBM battery

IBM has had over three weeks to respond to the feedback I sent them on the battery problem, and the "powerhouse of a reputation" language is still there on their e-commerce site. Meanwhile, I get a steady stream of emails from other people who have experienced battery problems. I take some satisfaction that my page is #1 on Google for "thinkpad 600 battery".

Based on their lack of response, I'm no longer recommending IBM laptops. It's a shame, because they're otherwise pretty good. I'm also not sure there are any laptop manufacturers that are better.

Some good experiences

But not all companies hang their customers up to dry. I've had absolutely excellent experiences with Crucial memory and Sears/Kenmore's replacement parts service. In both cases, the website led me directly to the part I wanted to by, the price was reasonable (if not rock-bottom), and the item was shipped quickly. In fact, I ordered the RAM upgrade from Crucial on Friday night, and it arrived Monday morning.

Berkeley and archival URL's

Berkeley's official policy is to revoke Web accounts by default when EECS students leave. The short lifetime of URL's bothers me. I feel it contributes to the "digital dark age" (a nice turn of phrase from the Internet Archive. I expect slightly better from our universities, who I feel should be responsible as guardians of our cultural artifacts, rather than merely providers of services.

I'm inclined to raise a stink about this (including bringing it to the department). In any case, please use long-term stable Web space if it's available, rather than the transient hosting provided by Berkeley and other institutions with similarly short-sighted policies.


Life is difficult just at the moment. The details don't belong in a public diary, but I'm willing to talk about it in person.

A consequence is that I've been even less responsive than usual to people. Apologies, as there's lots of interesting things going on. I'd love to look into Quixotic, respond to sej's interview questions, get out a proposal for a new IJS spec, and so on, but all these things will have to wait until the time is ripe.

So, in the meantime, I'm mostly working on things that are urgent for work (we're slamming down bugs for Ghostscript 8.0 and hope to release soon) and things which are soothing for my soul. Norm Megill quotes Rudy Rucker as saying that "Set theory can be viewed as a form of exact theology." I think that's one reason why it's so appealing right now. Life seems very inexact right now. I've been having a very nice discussion with Norm Megill, and have been exploring some ideas regarding definitional frameworks. I thought I had gotten it all figured out, but when I started looking closely at free variables in the definitional expansions, my first idea broke down. But at least I now know what the problem is!


One of the other things I've been spending a little time on recently is the Fitz prototype. The most recent bit of code is a compact representation for Bezier paths, because the Libart one is pretty damn bulky.

I think what I'll do is get the new prototype to the point where it can render basic paths, text (using FreeType), and images, then release it. It's also important to document the design, but I haven't had strong motivation for that recently.

Single signon

There was a fairly contentious discussion on #p2p-hackers about the single signon idea I blogged recently. Aaron Swartz said that he couldn't decide if was evil or misguided.

In any case, my initial enthusiasm has faded somewhat. I'm feeling overloaded enough as it is, and this wouldn't help. The coding would be fun, but I imagine that dealing with clueless blog engine authors would not.

Does the blog world really want a spam-resistant authentication mechanism for blog comments? I certainly want to get a good reading on this before I commit any kind of time into implementing it.


I've been tinkering with a Python verifier for Metamath for a few weeks, and finally got around to implementing the distinct variable constraints. It's under 300 lines of code, which I find very satisfying.

I find the whole thing a bit magical. Those 300 lines of code, plus a couple dozen axioms, effectively give you the building blocks for all of mathematics. Quoting from the Metamath book:

There you have it, the axioms for all of mathematics! Wonder at them and stare at them in awe. Put a copy in your wallet, and you will carry in your pocket the encoding for all theorems ever proved and that ever will be proved, from the most mundane to the most profound.

I personally feel that it's not quite complete, largely because you still have to know what you're doing to use the definitional framework. I have some ideas on this (surprise, surprise!)

Be warned, the code is a bit rough. I feel like I should add some comments and polish it some more. I've sent it to Norm Megill, and am eager to hear what he thinks of it.


The 7.32 beta release is now official, with Windows and Macintosh builds as well. Hopefully, this means that 8.0 is imminent.

Thanks to everyone who's been helping with this, especially my colleagues at Artifex Software. It feels really good to be part of a team.


If you're in the US, please vote today. Especially if you were inclined not to vote, please consider voting for your Green candidates. The California governor's race is particularly ugly. From the major parties, we have an incumbent who's proved excellent at finding the line between merely aggressive fundraising and outright bribery; and a challenger whose inept campaign hopefully not a predicter of his performance in office should he win, not to mention his closeness to fraudulent practices in his family firm.

By contrast, Pete Camejo is somebody you can vote for without having to hold your nose. A strong showing for him will send an important message that many people are not satisfied with the status quo and want something better. In addition, he has a lot of good experience with finance and ethical issues in particular, which suggests he would make a good governor in the extraordinarily unlikely event he wins.

Davis is sufficiently ahead in the polls that you should feel comfortable voting for Camejo even if the thought of a Republican in the governor's office disturbs you. Even so, I'm more inclined to vote for the best candidate than the lesser of two evils in this election.

Please consider your Green candidates in local races as well. Green candidates can and do win offices at the local level, and these are often positions where they can do some good.

But no matter what your political beliefs, vote. It's how our democracy, however flawed, works.

Proofs and Types

chalst: Hrm. Unfortunately, Girard's book is just a little over my current threshold for accessibility. At this point, I really prefer Web-available resources, as well, largely because I am doing this as a weblog and want to encourage other people to follow along.

I know I've just labeled myself a lazy student, but I'm wondering if you could give me some hints nonetheless. You mention Z_2-style and Coq-style consequence relations. What is the big difference between them? I'm not used to thinking about logical systems in terms of the consequence relation, but do know what the term means. Of course, I know one difference is that Coq is intuitionistic, but I think that's not the main thing you're talking about.

I'm also having a little trouble grasping the idea of a lambda calculus with the same strength as Z_2. Is the trick to write functions as finite strings, analogous to a computer program to compute the function? This way, you quantify over computable functions rather than the imaginable universe.

PS: chalst: Michael Norrish has some notes he'd like to send you, but doesn't have a working email address.

I don't feel like I have much extra time to write, but I've been thinking of doing a longer essay on some of the controversies in mathematics, especially as they're relevant to proof systems. The most important ones are logical strength (from primitive recursion to ZFC, with at least Z_2 and HOL as way-points, and specialized need for systems past both ends), types vs. no types, and partial functions vs. total functions. In the "moderately important" class I'd put classical vs. intuitionistic logic. The philosophical issues don't do much for me, and it seems like it's almost always possible to work around the intuitionism when you need to. Finally, in the least important category, I'd put choice of primitive operators and axioms for the propositional calculus. They're all quite equivalent, but that doesn't stop people from arguing their favorite.

Hmm, I wonder whether this is a reasonably complete list, or if there's another big one I'm overlooking.

Trust metrics for spam-free blog comments

Dave Winer talks about a large-scale single sign-on system for posting comments and replies in blogs. In this entry, he mostly talks about the convenience factor of not having to type in passwords and other information all the time.

At the same time, there's a lot more awareness of the vulnerability of comment and reply mechanisms to spam, trolls, and other forms of abuse. So far, the blog world has been quite civilized, but as spammers become tuned into the prestige and popularity of blog placement (not to mention their role in boosting Googlejuice), this will undoubtedly begin to change. Heather likens it to the Internet pre- and post-AOL.

I just realized today that both problems can be solved quite easily using the trust metric technology that powers the diary rating systems in Advogato. Blog servers, acting as clients to this new service, would get a cookie from the user's browser (probably using Passport-style techniques), and get an auth token back, also with a one-to-ten ranking, if so desired. This way, anyone who's hooked into the blog world can post comments and replies, but abuse would be limited.

In a dream world, we would have a decentralized peer-to-peer identity service based on a cryptographically sound protocol, but in the meantime there are a lot of people implementing blogs with very crude authentication. I think it will be possible to hack something together quickly.

Once the infrastructure is in place, I can see it being useful for a bunch of other things, including authenticating backlinks, which are also spam-vulnerable. And, once the social net of bloggers starts becoming reified into a net-accessible graph, the stage is set for evolution to a more decentralized system. It seems like a good way to get rapid adoption of the trust metric ideas.

I discussed the idea with Bram and others on #p2p-hackers, and he had some good ideas about how to make a simple challenge-response protocol that would be both easy to implement over the existing infrastructure and reasonably secure against the most common attacks such as password sniffing. He'll probably want to post about that himself.


A couple of days ago, Alan was trying to gross us out at the dinner table, which is of course a normal six-year-old thing. But, being Alan, he tried to convince us that he had "bee larvae" on his plate.

Max really loves his "puter games" and is quite good at running the computer. In particular, he can usually quit out of a game by himself, which is even more impressive considering the atrocious consistency of keybindings in the downloadable Mac OS X games. He seems especially good at reading icons and other visual elements. For example, just today, he got some stickers trying to convey "no sweat" by the international "no" circle overlaid on some water drops. He didn't get it exactly, but we were pretty impressed by his interpretation as "stop raining".

We were talking about blogging yesterday, and together singing fragments of the Kipper theme song, with the words "Kipper the blog".


chalst: Thanks for your highly relevant comments. Let me see if I can answer.

"You don't need to encode first-order terms as singleton second-order variables..." Right. Actually, thanks for reminding me about the need to quantify over things like total functions. I was pretty much only thinking about building them up as terms, for which the proof that the function is total follows directly from the structure. For quantifiers, probably the easiest technique is to put this predicate into an assumption.

"...you may not be respecting the distinction between the set of total functions and the set of function that SOA can prove total." This is entirely possible. Presumably, the important difference is that you you quantify over the total functions, but when you build up from terms, you're restricted to the provably total. I'm not sure yet how this might bite me.

"Finally, you need to check your translation actually captures the properties you are after." Actually, I may take the heretical position that this is not necessary. You may want to allow the translation to have essentially arbitrary properties. Of course, in many cases there will be a meaningful translation back from target language to source language, and it may be worth capturing that formally.

Your point about a rich set of datatypes is well taken. Basically, what I'm trying to figure out is whether such a set of datatypes can be mapped to a simple Z_2 axiomatization using a straightforward definitional scheme. I'm not even sure that growth in the size of propositions matters much: as I said, in most cases, you won't be doing the translation explicitly.

What's the best source to read about these things, especially the polymorphic lambda calculus systems with the same provability strength as Z_2?


I poked around at Google Answers a bit. It seems like a useful service, but I don't see any reason to get excited about it. There are two features I might have hoped for.

First, I don't see that the Answers provided help, or even particularly influence, the results of ordinary searches. This is disappointing, because it seems to me that Google's trust metric has the best chance of working when there's high quality manual input, especially regarding the trusted "seed". It's entirely possible that Google has two completely disjoint sets of people manually reviewing Internet sites (one set for Answers, another for deciding the seed), but that they don't talk to each other. I'm not sure.

The second missing feature is a bit harder to define, but it has to do with whether Answers is a true community. There are glimmerings of this. For example, having all the questions and Answers posted, along with comment posting open to all users, is cool. Many of the questions are interesting, and I can see people hanging out there. Another pro-community sign is that ordinary users and Researchers share the same namespace in the account/login system.

Most importantly, a lot of the Researchers seem to know each other, so there is something of a community there. But users don't really get to take part in it, even when (or perhaps because) they buy in monetarily. Part of the problem may be the way identities are concealed

If Google (or somebody else) figured out a way to build a real knowledge-seeking community, with money lubricating the system, I think they'd have something big. As it is, they have a reasonable service with a solid-gold brand behind it.

293 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!