Older blog entries for raph (starting at number 373)

ZF inconsistent?

fxn: I haven't gotten the chance to look at Brian Ford's paper, but it sure is intriguing.

If indeed ZF comes crashing down, that would really validate my decision to make proofs in Ghilbert truly portable with respect to axioms, rather than trying to pick a single axiom system that is both powerful enough for the work at hand, but not so powerful as to raise questions about soundness, or raise problems porting the proofs over to other systems.

That said, the timing could have been a little better for me; as a demonstration of Ghilbert's portability, I picked a construction of HOL logic in ZF[C] set theory, and it's only a few proofs away from being done (of the core HOL axioms, I have only the law of the excluded middle (BOOL_CASES) and a theorem to introduce the iota operator). Oh well.

Busy, busy

It has been a busy month since the last time I blogged. Among other things, we had an Artifex staff meeting, and also tor stayed for a couple of weeks. We had a great time talking serious 2D graphics, color, and so on, plus hanging out with the family.


I'm proud of a couple photos I took on Alan's 8th birthday; the portrait of him, and another of the brothers posing with a penguin prop. Both were taken with my archaic Rapid Omega 100 on slow B/W film (6x7cm), then scanned on my brand new Epson 4870 scanner (highly recommended, btw).

Alan's photo page has unbelievably high Google-juice - looking at the referer logs, the most popular images are the ones that currently show up as #1 and #2 for a Google image search on happy. Even more exciting, a photo from last year of him playing his guitar is now on the cover of a book recently published by Wesleyan University Press.

Max, who just turned four last week, is developing quite a subtle sense of humor. A few weeks ago, when I was reading him Eating the alphabet, he asked if I would get him a Xigua melon. I answered that I didn't remember ever seeing one in a store, but that if I did see one, I'd get it for him, then we'd be able to tell if it's as good as a watermelon, or better (watermelon is one of Max's absolute favorites right now). Max replied that he knew that Xiguas are way better than watermelon. I asked him how he knew that, and he said: I can see inside people's minds. Even fruit. And that Xigua melon is thinking, "I'm way better than a watermelon".

More later

"More later" is one of Josh Marshall's favorite catchphrases, and people rib him for it. I've been doing quite a bit of deep thinking lately about the place of free software in the world, among other things, and I do want to write about it, but it will have to wait until I have a bit more time.

Birthday boy

I turned '"' today. I can't think of any particular meaning to that number. Perhaps it is interesting precisely because it is the first uninteresting number.

Letter quality displays

Robb Beal asked me a couple of weeks ago how to get support for letter quality displays in Linux. It's a good question, and it deserves an in-depth answer.

I want to go over the highlights here, though. There are really three parts to the question. First, what can users do to push things along? Second, what are the "configuration science" problems that need to be solved? Third, what are the imaging problems that need to be solved to get high resolution bitmaps to the display hardware?

As a user, probably the most important thing is to let your friendly local developers know that you're interested in having letter quality displays supported. There are a bunch of ways to do so. If I expand this into a full post, I'll have two sample letters. One says, briefly, "you suck, free software developers are commies, by the way why doesn't your software support LQ?" The other says, "I love your software and would love it even more if it had LQ support. By the way, do you have a paypal address or maybe an Amazon wishlist so I can give you a small token of my appreciation?"

Indeed, funding is a large part of the problem, not least because the prototype displays are still rather expensive. Of course, after Microsoft rolls out their LQ stuff, then the hardware will be available at commodity prices, and then as the gear filters into the hands of free software developers, there will naturally be more motivation to support it. It's a shame that we, as a community, suck so hard at organized fund-raising, or else we'd have already put a few of these displays in the hands of the most important developers, as of course Microsoft did years ago.

Yes, that's a whine. On to the configuration science part of the question. The primary configuration parameter that must be discovered by applications and negotiated between application and display subsystem is the "pixels per px" ratio (which I'll abbreviate here as pppx). The px is the fundamental unit of constructing user interfaces, and has no preset size. On current display hardware, the pppx ratio is locked in at 1. If you like, "LQ support" could well be defined as the support of pppx ratios other than 1.

Many people believe (misguidedly, in my humble opinion), that the "correct" configuration parameter is the display dpi. Indeed, it's fairly easy to come by values for this - the EDID standard provides a way for the display to tell it to the video card, and most OS's now provide at least some support to access the info (in X, DisplayWidth and DisplayWidthMM; in Windows, similar information can be obtained from GetDeviceCaps; in Mac OS X, CGDisplayScreenSize).

UI's are built in units of px units, which, because of currently available display technologies, are identified with pixels. They could have been built out of some physical unit, for example points, so that getting a larger DPI value out of the EDID would cause applications to automatically draw their UI's with more pixels. (indeed, for quite some time a display resolution of 72dpi was so common that many people identified pixel and point, as well). If the technology had evolved that way, then LQ support wouldn't have been espcially difficult. However, technology didn't evolve that way. I don't see this as entirely a bad thing - there are good things about a px-based coordinate system, which I think I'll detail if this becomes a full-length article.

So, the negotiation has to happen something like this. The OS knows that a pppx value of, say, 2, is available. If the app is not LQ-aware, then as it asks for a display surface, the OS knows to give it one with 100 dpi "pixels", so that each pixel that the app draws shows up on the hardware as a 2x2 square of pixels (one can imagine other choices here, for example, drawing text at the full device resolution, but it will be a compromise in any case, and the Real Solution is always to make the app LQ-aware. The chunky pixels might not look that good, but at least they won't break anything or introduce bizarre new artifacts).

But if the app asks for the range of available pppx values, the OS will tell it that 2 is available, and then the app can ask for a drawing surface that matches the physical display resolution.

Once we get to that point, then the app has to actually come up with the higher resolution pixels. But this is a relatively straightforward technical problem, and different applications can solve it each in their own best way. Certainly, having a good 2D graphics library under the hood would be nice, but there are a few around (even if one of the more well-known was written by a "monkey with a PhD in low level coding" :), and if none are satisfactory, you can write your own.

By contrast, configuration science problems have a way of not being readily solved. My favorite running example is what happens when you press the backspace key. After many years, Linux systems have pretty much settled down, but if I login from my MacOS X laptop to my Linux box and open a forwarded X11 terminal, I still find that pressing backspace sometimes prints '^H' rather than deleting the character to the left of the cursor. Let us pray that we can do a better job with LQ configuration.

Update at other blog

Not free software related.

More on letter quality displays

I finally got around to putting up my Electronic Imaging '04 paper on the relationship between display resolution and perceived contrast of text.

Also, Kevin Burton sent me a link to this Interview with Microsoft's typography master. If you're interested in these topics, you'll definitely want to read it.

I think Microsoft is on to something here: find smart people who know what they're talking about, give them money and toys, and listen to them. It seems to work well for them, but there's no reason for them, or even evil proprietary software companies in general, to have a monopoly on this business method. Who knows, maybe we ingenious free software types can someday find a way to adapt it to our universe.

Not that I'm personally complaining, mind you. I really enjoy my job working on Ghostscript, and get to play with lots of cool graphics toys - I don't have a 200 dpi monitor in my studio yet, but Miles has offered to pay half for one, so it's tempting. I have a gut feeling that a letter quality desktop panel will be available within a few months at consumer pricing, at which point I'll jump on it.

If some kind benefactor wanted to set the cause of the free desktop forward a few months or more, one of the most effective things they could do is spread a few of those panels around to the people in the free software community who can make the best use of them: X, Gnome, KDE, Mozilla, etc. (I'd happily accept such a donation but would be equally happy to see it go to others for whom it's a more pressing need).

Indeed, the imminent arrival of letter quality displays will present a very crisp test of the health of the various organizations responsible for generating the relevant software. It's a pretty safe prediction that Microsoft will not only get the software mostly right, but also nearly singlehandedly create the mass market for these displays. Mac OS faces a choice - Apple can either lead as they did with FireWire, 802.11 and DVD burners, or they can drag behind and let the Wintel world kick their butt for a while, as they did with raw CPU power up until their shipment of the IBM 970. Sun will probably manage to screw up Java support royally - during the transition, I'm sure you'll see teeny fonts, chunky pixels, and related artifacts for quite some time. Mozilla won't even start to get its act together until there's reasonably good support for letter quality on platforms other than Internet Explorer (although the W3's sensible definition of the px will help them get to the goal once they really get started). And of course, it's reasonable to predict that the free software community will eventually get it right, but that it will take quite some time.

I think there are two other potential winners from the disruption to be caused by this technology: PDF and Flash. The win for PDF is pretty obvious; today's dot-matrix screens are just not quite good enough to display 8.5 x 11 inch pages with reasonable quality, in much the same way as early-'80s dot matrix printers were not quite good enough to render such pages on paper. The win for Flash is not quite as obvious, but I think just about as compelling. Once you get past Flash's reputation as the blink tag of the dot-com boom times, the underlying technology is actually pretty impressive. Existing Flash applications will immediately start looking good on letter quality displays once the client supports the devices, and Flash will continue to be one of the most painless ways to deliver such applications. Among other things, it's pretty darned cross-platform already, and that will probably just get better.

Interesting times ahead, that's for sure.

Advogato's DNS

I had DNS for advogato.org rather poorly configured. A power outage here this evening that took out the primary, and none of the secondaries thought they were authoritative, so oops. I also had the timeouts set quite short because of the recent server move, so double oops. Everything should be a lot more robust now.

Choice thread

I've posted the choice thread on ghilbert.org. For those of you who are slavishly following the development of Ghilbert, or fans of the Axiom of Choice, it should offer a glimmer of enlightenment.

The New York Times reaches about 1.5 million people. This posting is possibly of interest to two dozen. But the difference between my blog and the NYT is that my post will reach those two dozen :)

Version control

As NTK says, No self-respecting Thinker Of Hard Thoughts these days is without their own Deep Theory Of How To Do Version Control. It's not surprising to see so much activity in this sphere now. CVS has been broken for a long time, and it's now clear that Subversion only solves some of the problems of CVS.

I haven't actually played with Codeville yet, but I look forward to it. When Bram puts his mind to something, it often turns out well. I was also very interested to see Ken Shalk's CodeCon presentation on Vesta, a project I've actually been following since its inception about a decade ago.

The bottom line is that I think Vesta gets a few things very right, but some of the design decisions are going to hold it back from hitting the big time. Vesta is a source repository, a configuration manager, and a build tool. If you buy in to the Vesta way of doing things, all these pieces interact in a very nice way. For example, because you keep not only your source files but also the tools needed for building in the repository, you can always go back to a specific build, bit for bit. It uses some neat tricks to work - the files in the repository are exported through NFS, and, not so coincidentally, that's how the build knows what the dependencies are. If the file is accessed during the build, it's a depenedency, otherwise not.

The biggest downside, I think, is that it's quite Unix-specific. It's not impossible to run NFS on Windows or Mac, but it's not exactly convenient either.

I think it is possible to take the best ideas from Vesta and put them in a portable framework. Rather than a build being a script which runs random commands and litters directories with temporary and result files, it should be a functional program from input to output. All intermediate results should be considered a cache. Indeed, I see no reason why you shouldn't be able to take a source package, run a simple command, and have it spit out a .deb for Debian, .rpm's of the various flavors for the Red Hat-based distros (including some intelligent analysis of how many variants are actually needed), a .pkg or .dmg or whatever the Mac people decided is the preferred way to distribute OS X apps, an InstallShield-like installer for Windows, and a .pdb for Palms. Throw in a couple flags, and the Unix build is instrumented to support debugging and profiling, or maybe gcc bounds checking. Better yet, have it run in an interpreter such as eic, so that you can debug runtime violations at a source level.

What exactly is standing in the way of such a thing? My guess is that the main thing is inertia.

6 Mar 2004 (updated 8 Mar 2004 at 01:48 UTC) »
I voted touchscreen! I think

At least I think I voted. There's no way of knowing for sure, because it was on one of those fancy new Diebold machines.

The problem is, of course, familiar to those experienced in computer security. Because it's impossible to see security flaws, and very difficult, even for experts, to understand them, people get very complacent. Judges, for example, are wont to dismiss knowledge of such vulnerabilities as "speculative", rather than an "actual threat".

What changes the status quo is invariably an actual exploit. If people can see a voting machine being hacked, then they'll believe it. Fortunately, in this case it ought to be relatively easy.

I don't think it's time yet for large-scale civil disobedience about hackable voting. The most important thing, I think, is to spread the word about the dangers. As Avi Rubin points out, many election judges are elderly folk without a deep understanding of security flaws. I spoke with my election judges briefly. I told them that I was a computer scientist, and that maybe I know the secret to cracking the smart card they gave me. When I gave it back to them and informed them that I didn't, in fact, mess with it, they were visibly relieved.

I also liked Avi's idea of volunteering as a judge. That's probably the single best way to get the word out in a positive way.

In any case, after I voted I got a little sticker that says "I voted - touchscreen" with an American flag on it. Maybe for the election this November, we should hand out stickers with the slogan "I think I voted - touchscreen", and a flag with the 50 stars replaced by a BSOD.

Letter quality LCD's

I think this is going to be the year they go mainstream. They're showing up in more and more devices. It's also interesting that Tiger Direct has the IBM T221 for four grand, "while supplies last". It's possible, I think, that a newer model is in the pipeline.

Logic update

I got a very gratifying response from my last post, resulting in a very enlightening email correspondence with Michael Norrish, Bob Solovay, and Norm Megill. I'll post it soon, for those of you out there just dying to find out whether there really is a closed form expression for HOL's epsilon in ZFC.


CodeCon was great fun, and I'm really glad I went. The highlight was the Google party. How good was that? Let's put it this way: I was so absorbed with intense conversations that I completely missed Knuth crashing it.


elanthis: yes, IMHO of course. I've got one more in the pipeline, and very possibly some more coming.


Ghilbert was cited in a recent preprint by Carlos T. Simpson. One of the central themes of this paper is a preference for untyped, set-theory flavored math as opposed to the more type-flavored variety you see in systems such as HOL, NuPRL, and Coq. He's also not a big fan of the constructive flavor of the latter two systems (meaning that x || !x is not an axiom).

The main thing I'm doing in Ghilbert now is constructing HOL in set theory. It's slow and steady progress, and very satisfying. One issue I'm running into, though, is HOL's epsilon operator. Essentially, if f is a function from some type T to bool, then epsilon(f) chooses a value (of type T) for which f is true. This construction is very useful, but does pose some challenges.

In the special case where there is only one value for which f is true, there's no problem (in this case, the similar "iota" operator suffices). But it's not hard to come up with situations in which f might hold for infinitely many values, and choosing one out of the many is a problem. If you accept the Axiom of Choice, it tells you that it can be done, but it doesn't tell you which value to choose. So it doesn't provide much help for actually constructing HOL in set theory.

The HOL documentation adds further twists to the story. The HOL description document contains a paragraph (at the end of section 1.1) suggesting that the universe of HOL terms is actually a pretty small set (by the standards of set theory, anyway - it's still quite infinite), and that it should be possible to choose an element from any subset quite straightforwardly, even without assuming the axiom of choice. I am beginning to get a vague sense what they're talking about, but don't really understand it. If anyone out there reading could explain it to me, I'd really appreciate it. If not, I think I know who to ask.


I'll be at CodeCon this Friday and Saturday. I'm really looking forward to meeting up with my tribe. As an extra bonus, Heather and the kids are coming to pick me up Saturday afternoon, so it'll be a good opportunity for people in my tribe to meet my family.

Advogato maintenance

lkcl is absolutely right that I need to hand over the regular maintenance of the site. I'll start up a recruitment thread on virgule-dev after I get back from CodeCon.

The spammers are winning

When Paul Graham first published his Plan for Spam, I thought it fairly likely that his basic idea was sound, and that Bayesian-style classification would, if not eliminate spam, then at least make it manageable. Now I'm sure it won't work.

The problem is that there are two attacks that spammers can do, both of which are damned hard to do anything about. First, they can include bits of legitimate, high quality text in their messages [for example, I now see wikipedia text in Google-spam sites]. Second, by running as a virus-zombie, they can take over whatever authentication tokens are available for real mail. Note that this includes hashcash.

I still believe that a trust metric can be a part of a healthy balanced breakfast to end spam. But Google, the highest-profile deployment of a trust metric, seems to be experiencing a marked decline in quality, to the point where some people are questioning whether PageRank is such a good idea after all.

I don't know how to fix PageRank, but my current thinking, reinforced by my experiences here, is that negative recommendations are needed too. I feel quite empowered by the trust that Google places in me to rank a page up, but at the same time helpless to tell Google, "this site is pure spam".

Negative recs are not easy. For one, they don't really fit well in the PageRank model (the biggest difference between PageRank and the eigenvector-based tmetric used to power the diary ratings). Second, any simplistic approach (such as having an unauthenticated (or underauthenticated) "report this as spam" link) would be very vulnerable to DoS-type attacks, likely rendering the whole negative rec process unusable. There's a good technical reason to prefer monotonic trust metrics, and when I started my thesis project I concentrated on those, but I now think that their inability to use negative recs is crippling.

And, Zaitcev, I did notice that Orkut's trust model seemed quite primitive. I didn't kick its tires carefully, but I didn't see any obvious differences from Friendster's, which is simply based on the existence of a path of length no more than four. If Orkut is at CodeCon, I'm going to pin him down and see what he has to say for himself.

See you all at CodeCon!


Wow, it's been a long time since my last post. I've been kinda in hermit-mode, spending more time with the family, and not feeling as productive as I should be with work. I think I'm emerging now. Things are going fairly well, and I have quite a few things I want to write about here.


I also have a few things I want to write about which are more political and less related to free software. I finally did a bit of a brain-dump on my other blog. If you've been following the updates on my boys, then by all means don't miss the bit of improv radio theater we recorded a few days ago.


It's been a busy few weeks - my mom, then my brother and his girlfriend came to visit, and somewhere in the middle of all that we had an Artifex staff meeting.

Things are quieting down now. We had a nice family evening, playing video games and doing a little papercraft. I tried the tiger, just on cheap paper and b/w laser printing, and it came out ok. Of course, Max then wanted to do one of the motorcycles, but I convinced him that we would some other day.

BitTorrent and RSS

There's a thread going around the net on the benefits of combining RSS with BitTorrent. I agree there's something there, but want to make a distinction between the "easy" combination which is quite feasible right now, and one that requires a bit more rocket science (actually, Internet protocol design, but from what I know of both, the latter is more difficult to do well). In the "easy" combination, you have your whole RSS infrastructure exactly as it is now, but use BitTorrent to distribute the "attachments". People have been experimenting with RSS enclosures (for speech, music, video, and whatnot) for a while, but they're not hugely popular yet. One of the reasons is the difficulty and expense of providing download bandwidth for the large files that people will typically want to enclose. BitTorrent can solve that.

In fact, BitTorrent's strengths seem to mesh well with RSS. BT shines when lots of people want to download the same largish file at the same time - it's weaker at providing access to diverse archives with more random patterns of temporal access. Also, BT scales nicely with the number of concurrent downloaders - you get about equally good performance with a dozen or ten thousand. So if someone shoots a really cool digital video, posts it to their blog, then gets Slashdotted, it all still flows.

Integrating BT with a daemon that retrieves RSS feeds in the background has other advantages, as well. If the person opens the file a while after the download begins (which might be as soon as the RSS is updated), most or all of the latency of downloading that file can be hidden. Further, since the BT implementation is released under a near-public domain license, it should be relatively easy for people to integrate it into their blog-browsing applications.

An example of a blog that would work superbly with BT is Chris Lydon's series of interviews.

But Steve Gillmor's article isn't primarily about enclosures - it suggests that we can use BT to manage the RSS feed itself. I think there's something to the idea, but the existing protocol and implementation isn't exactly what's needed. BT is best at downloading large static files. You start with a "torrent" file, which is essentially a Merkle hash tree of the file packaged up with a URL where the "tracker" can be reached. All peers uploading and downloading the file register with the tracker, and get a list of other peers to connect with. Then, peers exchange blocks of the file with each other, using very clever techniques to optimize the overall throughput. After each block is transferred, its hash is checked against what's in the torrent file, and discarded if it doesn't match.

But RSS files themselves are relatively small, so it's unlikely that all that much bandwidth would be saved sending torrent files and running a tracker, as opposed to simply sending the RSS file itself. Further, the big performance problem with RSS is the tradeoff between polling the RSS feed infrequently, resulting in large latencies between the time the feed is updated and viewers get to see it, or polling it frequently and chewing up tons of bandwidth from the server. BT doesn't do much to help with this - you'd be polling the torrent file exactly as frequently as you're polling the RSS file now.

I believe, however, that the BitTorrent protocol could be adapted into one that solves the problem of change notification. The protocol is very smart, and already has much of the infrastructure that's needed. In particular, peers already do notify each other when they receive new blocks. That's not change notification because the contents of the blocks are immutable (and that's enforced by checking the hash), but it's not too hard to see how it could be adapted. At heart, you'd replace the static hash tree of the existing torrent file format with a digital signature. The "publisher" node would then send new digitally signed blocks into the network, where they'd be propagated by the peers. There'd be essentially no network activity in between updates, and, as in the existing BitTorrent protocol, the load on the publisher node would be about the same whether it was feeding a dozen or ten thousand listeners. I'd also expect latency to scale very nicely as well (probably as the log of the number of peers, and with fast propagation along the low latency "backbone" of the peer network).

I'd hate to see such a beautiful work of engineering restricted to just providing RSS feeds - ideally, it would be general enough to handle all sorts of different applications which require change notification. One such is the propagation of RPM or Debian package updates, which obviously has strong requirements for both scaling and robustness. The main thing that's keeping it from happening, I think, is the dearth of people who really understand the BitTorrent protocol.

Proof systems

I've been hacking a bit on my toy proof language. Aside from slowly bringing the verifier up to the point where it checks everything that should be checked, I'm also hacking up an implementation of the HOL inference rules constructed in ZF set theory.

It's immensely satisfying to construct proofs that are correct with high assurance, which is such a contrast from hacking code - any time you write nontrivial code, you know it's got lots of bugs in it, many of which no doubt can be exploited to create security vulnerabilities.

Fat chance

Oooh. Turns out the FAT file system is patented, and Microsoft has a commercially reasonable licensing plan for it. Of course, given the way the patent system is structured, this kind of thing is absolutely inevitable.

As I've written before, I think Paul Krugman got it wrong when he argued that Microsoft's monopoly position is not much of a threat. On the other hand, kudos to him for publicizing the rotten state of the voting machine industry. Those in the know have been aware of serious problems for some time, but the media so far has been doing a good job of keeping the unwashed masses blissfully unaware. Love Krugman or hate him, it'll be harder to keep this issue quiet now.

Color profiling

A followup to my recent post calling into question Argyll's multidimensional interpolation algorithm. I had some email back and forth with Graeme, and it's now clear that the multidimensional interpolation itself is quite good. The "wiggliness" in my profiles seems to have been caused by an overzealous attempt to optimize the per-channel curves. With those disabled, I'm now getting some really, really good profiles.

I still think that my idea of a diffusion equation based interpolation algorithm might have merit, but considering the good results I'm seeing now from Argyll, it pretty much goes on the back burner along with two or three dozen others. It might become interesting if reasonably priced handheld spectrocolorimeters with a USB port ever make it to market - typical professional profiles are made from 3000 or more patches, but it might be possible to get good results from a much smaller sample, if done right.

More proof links

Claus Dahl sent in a link to yet another proof system/language. In the meantime, I've been corresponding with Norm Megill of Metamath fame, Freek Wiedijk, and Michael Meyling of Hilbert II. There's something addictive about formal proofs. How else to explain why there are so many projects to work on them?

364 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!