Older blog entries for raph (starting at number 374)

More Letter Quality

You don't see too much sign of it on his weblog, but Robb Beal is going absolutely wild researching issues around letter quality displays, and what it'll take to get free software to adequately support them. If you care about these things (which, from the response I typically get from my LQ blog entries, is rather improbable), he'd be a great person to get in touch with.

The near-letter quality and letter-quality devices continue to roll out. In the former camp is the Sony EBR-1000 book reader, which has an exotic monochrome 170 dpi e-ink display. The software is a mixed bag - apparently it has really crippling DRM, but on the other hand it's Linux based, so presumably people will be able to run good software on it. Hey, send me one and I'll see that Ghostscript gets ported.

One of the newest letter quality devices is the Archos AV500, sporting an approximately 200dpi 706x480 screen. The main app for this thing is playing movies stored on its internal hard drive. As far as I know, it's the first such device play at near-DVD resolution (and, in the 4" form factor, DVD resolution and letter quality are pretty much synonymous). The main competition in this field would appear to be the Portable Media Center platform from Microsoft, which as far as I can tell specs a 320x240 LCD.

LQ: What is to be done?

The issue of how to best support LQ displays is among the many things tor and I discussed. He disagrees with my proposal to have a global "pixels per px" (pppx) ratio for the entire display, and do pixel-doubling at a low level for applications which are not LQ-aware.

I see his point. Presumably, new LQ-aware apps are going to do the right thing, so having extra machinery for automatic pixel-doubling is just more cruft. Pixel-doubling is only relevant for older, legacy apps. I tend to use xv as something of a canonical example, given it's very old-school handcoded X interface (clearly, toolkits are for weenies), a website more than three years stale, a questionable license, and, last but not least, the fact that it's still my image viewer of choice.

Obviously, if you run xv on a letter quality screen, the 5 point fonts won't be very appealing. There are several ways to fix, or at least partially fix, the problem. Probably the way most consistent with the free software philosophy is to build an old-school, yet 2 pppx UI for xv, and patch it in. Of all the approaches, this one probably gives the best overall results.

Alternatively, you could do automatic pixel doubling, somehow recognizing that xv is not LQ-aware. You could do pixel doubling manually (this would, of course, be the most Unix-like way of doing things - provide an incredibly powerful array of tools for solving problems, but don't go 1/200th of an inch towards making them do the right thing by default). In the latter case, you'd have any number of different choices for implementing the pixel-doubling, including an Xnest-like application, running everything through VNC, and I'm sure others will think of more.

Other possibilities include ditching xv and upgrading to an image viewer which is LQ aware, or treating the entire dot matrix resolution X desktop as a retrocomputing target (it's worth noting that many emulators already do pixel-doubling, to adequately display really low res UI's on modern dot-matrix displays).

I still think that automatic pixel-doubling is the most user-friendly thing to do, but perhaps not implementing it is exactly the sort of mosquito bite necessary to motivate the army of Linux developers to fix their favorite apps to be properly LQ aware.

The pppx ratio should be a global configuration parameter

So, while I'm not going to make a strong argument about a pixel-doubling policy, I would like to make a strong argument in favor of explicitly promoting pppx to the status of global configuration parameter. After all, pixel-doubling is merely a stopgap along the road of true LQ support, while fucking up the pppx ratio will continue to make life difficult until my grandkids get sick of the situation and decide to do something about it. (and hopefully, while they're at it, will they please fix the last vestige of the backspace problem? That's generally ok in a Linux-only world, but still bites people when mixing with Mac OS X)

Unless people actually start listening to me, we most likely won't see a global pppx parameter, and so here are my predictions of the bad things that will happen.

Without an explicit pppx ratio, application writers will invariably attempt to guess it on their own, based on information such as monitor resolution information reported through the EDID channel. Inevitably, different apps will come to different conclusions, depending on the exact parameters, so the relative visual size of interface units will be even more inconsistent than today (however, inconsistent UI appearance is very much the X/Unix way, so maybe that doesn't bother people - perhaps I've simply been spoiled by the time I've spent using Mac OS X).

What are the different ways to derive a pppx ratio? I'm sure the most common will be to try to dictate the physical size of a px. If you read the literature on scalable user interfaces, you'll see many people advocating the "point" as the unit of user interfaces, which would effectively nail the pppx ratio to the monitor dpi divided by seventy-two. Other reasonable approaches would be to set pppx to 1 for dot-matrix displays, 2 for letter-quality, and flip a coin for intermediate values of resolution.

Worse, when people are unsatisfied with the pppx ratio guessed on their behalf, they will resort to even uglier hacks to work around it. If the pppx is derived from the monitor dpi, then people will figure out ways to lie about that, which will of course break other apps that actually depend on a correct value.

What's wrong with nailing a px to a definite physical size unit (for example, equating px with point)? In short, because not everybody has the same visual acuity. Software developers, who tend to be uniformly young and virile, easily lose sight of that fact. Currently, on almost all systems, the pppx ratio is fixed at 1, so users have a very handy way to configure the physical size of a px - simply choose a monitor resolution. On CRT's, this is easily done with modelines, but for LCD's, you pretty much have to choose that when you buy the panel. Even so, for most people the choice of an LCD is a fairly intimate, personal decision, and I imagine people consider the size of the px carefully, whether they're consciously aware of it or not. For me, low-res panels display blocky-looking text, and images are not crisp enough. For many others, on higher-res panels, text is too small. I wrote about this over three years ago; Apple was actually touting the low resolution of their expensive displays as a feature, which of course it is if your preference is a fairly large px value, and the only way to achieve that preference is through hardware choice.

Of course, the whole point of high resolution displays is to not have to make the tradeoff between desired px value and display sharpness (the third prong of the tradeoff, of course, is the added cost of generating all those extra pixels in the CPU and turning them into light, but Moore's Law should take care of that, in time).

If, as a software developer, you are tempted to fix the px to a definite physical size, then I suggest you go to your local bookstore, library, or church, and suggest banning all large print books and small print bibles. After all, readability studies have proven that 10-12 point text has optimum readability (for the average reader, anyway).

A bit of bitterness

Based on my experience in the world of free software, I feel reasonably confident predicting that we won't get the pppx configuration parameter right. If I were dictator of the universe, there wouldn't be any such problem, but of course that's not the way free software works, and for the most part that's a good thing. (the primary exception, of course, being the chain of keycode mappings applied to the backspace key - if it were up to me, I'd say that everybody should use the mappings that happened to correspond to the keyboard config I'm typing on now, and then of course the problem would be solved for everybody)

The world of proprietary software has been very good to me lately. I don't get to make the big architectural decisions (those are up to space aliens holed up near Redmond), but I do tend to get a lot of appreciation for my ability to solve problems. Money too, which is nice, but here I'm talking about appreciation.

Lately, I've been making gorgeous prints on inkjet devices (most of my work is on an Epson 2200). I love experiencing people's reactions when they see those prints, and the obvious impact that they have on people's decision to license our technology. It's a rush, a high, and I haven't felt anything like that from the free software world in some time.

I do, however, get emails like this one:

To whom it may concern, I have installed the ghost..garbage, and the gsvu garbage software, and it's a bunch of crap. It doesn't work nor does it convert ps to text. Go-on publishing your version after version-it's useless. Moreover, nearly everything you guys have published about it is a bunch of gobledee-gook that says nothing, and is repititious and loops back to nothingness. Why can't you do or say something useful that normal people can understand and use.

Leon Stambler

Another one that stung, perhaps more than it should have, was a recent description of Libart as having been written by a "monkey with a PHD in low level programming." When I started Libart, I had very warm, fuzzy feelings about solving problems of 2D graphics rendering for the benefit of the free software world. Now, while I'm pleased that some of the ideas of Libart seem to have made an impact, every time I see an app develop their own new graphics library, it bugs me a bit (xpdf's splash, sodipodi/inkscape's libnr, mozilla's gfx, qt's "arthur"), not to mention the fact that fontforge still doesn't do aa rendering, roughly five years after my gfonted prototype.

I'm not just trying to whine at the world here. I know I'm not great at coordinating distributed projects, especially with volunteers. What I am saying is that the free software community has a pretty serious process bug in that it's not very good at making use of my knowledge and skill at 2D graphics, nor at giving me back much in the way of "goodies" in return for my contributions.

So, looking forward at the professional work I contemplate, I find myself getting excited about the prospect of building a top-notch printer controller, to be sold as a part of real products. I do not find myself getting excited at all about solving, say, the printing problems of Linux.

That said, my feelings toward free software and the free sofware community are decidedly bittersweet rather than purely bitter. While I don't especially enjoy having my work NIH'd, I do appreciate and respect the culture of learning that so often leads people to code up their own solution to a problem rather than taking an existing solution off the shelf. Hell, I've been guilty of that enough times myself. Further, in the above I'm talking about my professional work. For other things, such as pure speculative research (including, most definitely, Ghilbert), it's impossible to imagine doing things in any other way than free software. Finally, as long as my professional work is Ghostscript, all of my code gets released under the GPL after a year in any case, so I could get as bitter and negative as I wanted, and everybody would still get the benefit of my work.

Apologies for whining. But I did want to get some of these thoughts off my chest, and I do think that writing about these issues is preferable to not writing at all, which is mostly what I've been doing.

ZF inconsistent?

fxn: I haven't gotten the chance to look at Brian Ford's paper, but it sure is intriguing.

If indeed ZF comes crashing down, that would really validate my decision to make proofs in Ghilbert truly portable with respect to axioms, rather than trying to pick a single axiom system that is both powerful enough for the work at hand, but not so powerful as to raise questions about soundness, or raise problems porting the proofs over to other systems.

That said, the timing could have been a little better for me; as a demonstration of Ghilbert's portability, I picked a construction of HOL logic in ZF[C] set theory, and it's only a few proofs away from being done (of the core HOL axioms, I have only the law of the excluded middle (BOOL_CASES) and a theorem to introduce the iota operator). Oh well.

Busy, busy

It has been a busy month since the last time I blogged. Among other things, we had an Artifex staff meeting, and also tor stayed for a couple of weeks. We had a great time talking serious 2D graphics, color, and so on, plus hanging out with the family.

Kids

I'm proud of a couple photos I took on Alan's 8th birthday; the portrait of him, and another of the brothers posing with a penguin prop. Both were taken with my archaic Rapid Omega 100 on slow B/W film (6x7cm), then scanned on my brand new Epson 4870 scanner (highly recommended, btw).

Alan's photo page has unbelievably high Google-juice - looking at the referer logs, the most popular images are the ones that currently show up as #1 and #2 for a Google image search on happy. Even more exciting, a photo from last year of him playing his guitar is now on the cover of a book recently published by Wesleyan University Press.

Max, who just turned four last week, is developing quite a subtle sense of humor. A few weeks ago, when I was reading him Eating the alphabet, he asked if I would get him a Xigua melon. I answered that I didn't remember ever seeing one in a store, but that if I did see one, I'd get it for him, then we'd be able to tell if it's as good as a watermelon, or better (watermelon is one of Max's absolute favorites right now). Max replied that he knew that Xiguas are way better than watermelon. I asked him how he knew that, and he said: I can see inside people's minds. Even fruit. And that Xigua melon is thinking, "I'm way better than a watermelon".

More later

"More later" is one of Josh Marshall's favorite catchphrases, and people rib him for it. I've been doing quite a bit of deep thinking lately about the place of free software in the world, among other things, and I do want to write about it, but it will have to wait until I have a bit more time.

Birthday boy

I turned '"' today. I can't think of any particular meaning to that number. Perhaps it is interesting precisely because it is the first uninteresting number.

Letter quality displays

Robb Beal asked me a couple of weeks ago how to get support for letter quality displays in Linux. It's a good question, and it deserves an in-depth answer.

I want to go over the highlights here, though. There are really three parts to the question. First, what can users do to push things along? Second, what are the "configuration science" problems that need to be solved? Third, what are the imaging problems that need to be solved to get high resolution bitmaps to the display hardware?

As a user, probably the most important thing is to let your friendly local developers know that you're interested in having letter quality displays supported. There are a bunch of ways to do so. If I expand this into a full post, I'll have two sample letters. One says, briefly, "you suck, free software developers are commies, by the way why doesn't your software support LQ?" The other says, "I love your software and would love it even more if it had LQ support. By the way, do you have a paypal address or maybe an Amazon wishlist so I can give you a small token of my appreciation?"

Indeed, funding is a large part of the problem, not least because the prototype displays are still rather expensive. Of course, after Microsoft rolls out their LQ stuff, then the hardware will be available at commodity prices, and then as the gear filters into the hands of free software developers, there will naturally be more motivation to support it. It's a shame that we, as a community, suck so hard at organized fund-raising, or else we'd have already put a few of these displays in the hands of the most important developers, as of course Microsoft did years ago.

Yes, that's a whine. On to the configuration science part of the question. The primary configuration parameter that must be discovered by applications and negotiated between application and display subsystem is the "pixels per px" ratio (which I'll abbreviate here as pppx). The px is the fundamental unit of constructing user interfaces, and has no preset size. On current display hardware, the pppx ratio is locked in at 1. If you like, "LQ support" could well be defined as the support of pppx ratios other than 1.

Many people believe (misguidedly, in my humble opinion), that the "correct" configuration parameter is the display dpi. Indeed, it's fairly easy to come by values for this - the EDID standard provides a way for the display to tell it to the video card, and most OS's now provide at least some support to access the info (in X, DisplayWidth and DisplayWidthMM; in Windows, similar information can be obtained from GetDeviceCaps; in Mac OS X, CGDisplayScreenSize).

UI's are built in units of px units, which, because of currently available display technologies, are identified with pixels. They could have been built out of some physical unit, for example points, so that getting a larger DPI value out of the EDID would cause applications to automatically draw their UI's with more pixels. (indeed, for quite some time a display resolution of 72dpi was so common that many people identified pixel and point, as well). If the technology had evolved that way, then LQ support wouldn't have been espcially difficult. However, technology didn't evolve that way. I don't see this as entirely a bad thing - there are good things about a px-based coordinate system, which I think I'll detail if this becomes a full-length article.

So, the negotiation has to happen something like this. The OS knows that a pppx value of, say, 2, is available. If the app is not LQ-aware, then as it asks for a display surface, the OS knows to give it one with 100 dpi "pixels", so that each pixel that the app draws shows up on the hardware as a 2x2 square of pixels (one can imagine other choices here, for example, drawing text at the full device resolution, but it will be a compromise in any case, and the Real Solution is always to make the app LQ-aware. The chunky pixels might not look that good, but at least they won't break anything or introduce bizarre new artifacts).

But if the app asks for the range of available pppx values, the OS will tell it that 2 is available, and then the app can ask for a drawing surface that matches the physical display resolution.

Once we get to that point, then the app has to actually come up with the higher resolution pixels. But this is a relatively straightforward technical problem, and different applications can solve it each in their own best way. Certainly, having a good 2D graphics library under the hood would be nice, but there are a few around (even if one of the more well-known was written by a "monkey with a PhD in low level coding" :), and if none are satisfactory, you can write your own.

By contrast, configuration science problems have a way of not being readily solved. My favorite running example is what happens when you press the backspace key. After many years, Linux systems have pretty much settled down, but if I login from my MacOS X laptop to my Linux box and open a forwarded X11 terminal, I still find that pressing backspace sometimes prints '^H' rather than deleting the character to the left of the cursor. Let us pray that we can do a better job with LQ configuration.

Update at other blog

Not free software related.

More on letter quality displays

I finally got around to putting up my Electronic Imaging '04 paper on the relationship between display resolution and perceived contrast of text.

Also, Kevin Burton sent me a link to this Interview with Microsoft's typography master. If you're interested in these topics, you'll definitely want to read it.

I think Microsoft is on to something here: find smart people who know what they're talking about, give them money and toys, and listen to them. It seems to work well for them, but there's no reason for them, or even evil proprietary software companies in general, to have a monopoly on this business method. Who knows, maybe we ingenious free software types can someday find a way to adapt it to our universe.

Not that I'm personally complaining, mind you. I really enjoy my job working on Ghostscript, and get to play with lots of cool graphics toys - I don't have a 200 dpi monitor in my studio yet, but Miles has offered to pay half for one, so it's tempting. I have a gut feeling that a letter quality desktop panel will be available within a few months at consumer pricing, at which point I'll jump on it.

If some kind benefactor wanted to set the cause of the free desktop forward a few months or more, one of the most effective things they could do is spread a few of those panels around to the people in the free software community who can make the best use of them: X, Gnome, KDE, Mozilla, etc. (I'd happily accept such a donation but would be equally happy to see it go to others for whom it's a more pressing need).

Indeed, the imminent arrival of letter quality displays will present a very crisp test of the health of the various organizations responsible for generating the relevant software. It's a pretty safe prediction that Microsoft will not only get the software mostly right, but also nearly singlehandedly create the mass market for these displays. Mac OS faces a choice - Apple can either lead as they did with FireWire, 802.11 and DVD burners, or they can drag behind and let the Wintel world kick their butt for a while, as they did with raw CPU power up until their shipment of the IBM 970. Sun will probably manage to screw up Java support royally - during the transition, I'm sure you'll see teeny fonts, chunky pixels, and related artifacts for quite some time. Mozilla won't even start to get its act together until there's reasonably good support for letter quality on platforms other than Internet Explorer (although the W3's sensible definition of the px will help them get to the goal once they really get started). And of course, it's reasonable to predict that the free software community will eventually get it right, but that it will take quite some time.

I think there are two other potential winners from the disruption to be caused by this technology: PDF and Flash. The win for PDF is pretty obvious; today's dot-matrix screens are just not quite good enough to display 8.5 x 11 inch pages with reasonable quality, in much the same way as early-'80s dot matrix printers were not quite good enough to render such pages on paper. The win for Flash is not quite as obvious, but I think just about as compelling. Once you get past Flash's reputation as the blink tag of the dot-com boom times, the underlying technology is actually pretty impressive. Existing Flash applications will immediately start looking good on letter quality displays once the client supports the devices, and Flash will continue to be one of the most painless ways to deliver such applications. Among other things, it's pretty darned cross-platform already, and that will probably just get better.

Interesting times ahead, that's for sure.

Advogato's DNS

I had DNS for advogato.org rather poorly configured. A power outage here this evening that took out the primary, and none of the secondaries thought they were authoritative, so oops. I also had the timeouts set quite short because of the recent server move, so double oops. Everything should be a lot more robust now.

Choice thread

I've posted the choice thread on ghilbert.org. For those of you who are slavishly following the development of Ghilbert, or fans of the Axiom of Choice, it should offer a glimmer of enlightenment.

The New York Times reaches about 1.5 million people. This posting is possibly of interest to two dozen. But the difference between my blog and the NYT is that my post will reach those two dozen :)

Version control

As NTK says, No self-respecting Thinker Of Hard Thoughts these days is without their own Deep Theory Of How To Do Version Control. It's not surprising to see so much activity in this sphere now. CVS has been broken for a long time, and it's now clear that Subversion only solves some of the problems of CVS.

I haven't actually played with Codeville yet, but I look forward to it. When Bram puts his mind to something, it often turns out well. I was also very interested to see Ken Shalk's CodeCon presentation on Vesta, a project I've actually been following since its inception about a decade ago.

The bottom line is that I think Vesta gets a few things very right, but some of the design decisions are going to hold it back from hitting the big time. Vesta is a source repository, a configuration manager, and a build tool. If you buy in to the Vesta way of doing things, all these pieces interact in a very nice way. For example, because you keep not only your source files but also the tools needed for building in the repository, you can always go back to a specific build, bit for bit. It uses some neat tricks to work - the files in the repository are exported through NFS, and, not so coincidentally, that's how the build knows what the dependencies are. If the file is accessed during the build, it's a depenedency, otherwise not.

The biggest downside, I think, is that it's quite Unix-specific. It's not impossible to run NFS on Windows or Mac, but it's not exactly convenient either.

I think it is possible to take the best ideas from Vesta and put them in a portable framework. Rather than a build being a script which runs random commands and litters directories with temporary and result files, it should be a functional program from input to output. All intermediate results should be considered a cache. Indeed, I see no reason why you shouldn't be able to take a source package, run a simple command, and have it spit out a .deb for Debian, .rpm's of the various flavors for the Red Hat-based distros (including some intelligent analysis of how many variants are actually needed), a .pkg or .dmg or whatever the Mac people decided is the preferred way to distribute OS X apps, an InstallShield-like installer for Windows, and a .pdb for Palms. Throw in a couple flags, and the Unix build is instrumented to support debugging and profiling, or maybe gcc bounds checking. Better yet, have it run in an interpreter such as eic, so that you can debug runtime violations at a source level.

What exactly is standing in the way of such a thing? My guess is that the main thing is inertia.

6 Mar 2004 (updated 8 Mar 2004 at 01:48 UTC) »
I voted touchscreen! I think

At least I think I voted. There's no way of knowing for sure, because it was on one of those fancy new Diebold machines.

The problem is, of course, familiar to those experienced in computer security. Because it's impossible to see security flaws, and very difficult, even for experts, to understand them, people get very complacent. Judges, for example, are wont to dismiss knowledge of such vulnerabilities as "speculative", rather than an "actual threat".

What changes the status quo is invariably an actual exploit. If people can see a voting machine being hacked, then they'll believe it. Fortunately, in this case it ought to be relatively easy.

I don't think it's time yet for large-scale civil disobedience about hackable voting. The most important thing, I think, is to spread the word about the dangers. As Avi Rubin points out, many election judges are elderly folk without a deep understanding of security flaws. I spoke with my election judges briefly. I told them that I was a computer scientist, and that maybe I know the secret to cracking the smart card they gave me. When I gave it back to them and informed them that I didn't, in fact, mess with it, they were visibly relieved.

I also liked Avi's idea of volunteering as a judge. That's probably the single best way to get the word out in a positive way.

In any case, after I voted I got a little sticker that says "I voted - touchscreen" with an American flag on it. Maybe for the election this November, we should hand out stickers with the slogan "I think I voted - touchscreen", and a flag with the 50 stars replaced by a BSOD.

Letter quality LCD's

I think this is going to be the year they go mainstream. They're showing up in more and more devices. It's also interesting that Tiger Direct has the IBM T221 for four grand, "while supplies last". It's possible, I think, that a newer model is in the pipeline.

Logic update

I got a very gratifying response from my last post, resulting in a very enlightening email correspondence with Michael Norrish, Bob Solovay, and Norm Megill. I'll post it soon, for those of you out there just dying to find out whether there really is a closed form expression for HOL's epsilon in ZFC.

CodeCon

CodeCon was great fun, and I'm really glad I went. The highlight was the Google party. How good was that? Let's put it this way: I was so absorbed with intense conversations that I completely missed Knuth crashing it.

Patents

elanthis: yes, IMHO of course. I've got one more in the pipeline, and very possibly some more coming.

Logic

Ghilbert was cited in a recent preprint by Carlos T. Simpson. One of the central themes of this paper is a preference for untyped, set-theory flavored math as opposed to the more type-flavored variety you see in systems such as HOL, NuPRL, and Coq. He's also not a big fan of the constructive flavor of the latter two systems (meaning that x || !x is not an axiom).

The main thing I'm doing in Ghilbert now is constructing HOL in set theory. It's slow and steady progress, and very satisfying. One issue I'm running into, though, is HOL's epsilon operator. Essentially, if f is a function from some type T to bool, then epsilon(f) chooses a value (of type T) for which f is true. This construction is very useful, but does pose some challenges.

In the special case where there is only one value for which f is true, there's no problem (in this case, the similar "iota" operator suffices). But it's not hard to come up with situations in which f might hold for infinitely many values, and choosing one out of the many is a problem. If you accept the Axiom of Choice, it tells you that it can be done, but it doesn't tell you which value to choose. So it doesn't provide much help for actually constructing HOL in set theory.

The HOL documentation adds further twists to the story. The HOL description document contains a paragraph (at the end of section 1.1) suggesting that the universe of HOL terms is actually a pretty small set (by the standards of set theory, anyway - it's still quite infinite), and that it should be possible to choose an element from any subset quite straightforwardly, even without assuming the axiom of choice. I am beginning to get a vague sense what they're talking about, but don't really understand it. If anyone out there reading could explain it to me, I'd really appreciate it. If not, I think I know who to ask.

CodeCon

I'll be at CodeCon this Friday and Saturday. I'm really looking forward to meeting up with my tribe. As an extra bonus, Heather and the kids are coming to pick me up Saturday afternoon, so it'll be a good opportunity for people in my tribe to meet my family.

Advogato maintenance

lkcl is absolutely right that I need to hand over the regular maintenance of the site. I'll start up a recruitment thread on virgule-dev after I get back from CodeCon.

The spammers are winning

When Paul Graham first published his Plan for Spam, I thought it fairly likely that his basic idea was sound, and that Bayesian-style classification would, if not eliminate spam, then at least make it manageable. Now I'm sure it won't work.

The problem is that there are two attacks that spammers can do, both of which are damned hard to do anything about. First, they can include bits of legitimate, high quality text in their messages [for example, I now see wikipedia text in Google-spam sites]. Second, by running as a virus-zombie, they can take over whatever authentication tokens are available for real mail. Note that this includes hashcash.

I still believe that a trust metric can be a part of a healthy balanced breakfast to end spam. But Google, the highest-profile deployment of a trust metric, seems to be experiencing a marked decline in quality, to the point where some people are questioning whether PageRank is such a good idea after all.

I don't know how to fix PageRank, but my current thinking, reinforced by my experiences here, is that negative recommendations are needed too. I feel quite empowered by the trust that Google places in me to rank a page up, but at the same time helpless to tell Google, "this site is pure spam".

Negative recs are not easy. For one, they don't really fit well in the PageRank model (the biggest difference between PageRank and the eigenvector-based tmetric used to power the diary ratings). Second, any simplistic approach (such as having an unauthenticated (or underauthenticated) "report this as spam" link) would be very vulnerable to DoS-type attacks, likely rendering the whole negative rec process unusable. There's a good technical reason to prefer monotonic trust metrics, and when I started my thesis project I concentrated on those, but I now think that their inability to use negative recs is crippling.

And, Zaitcev, I did notice that Orkut's trust model seemed quite primitive. I didn't kick its tires carefully, but I didn't see any obvious differences from Friendster's, which is simply based on the existence of a path of length no more than four. If Orkut is at CodeCon, I'm going to pin him down and see what he has to say for himself.

See you all at CodeCon!

ltnb

Wow, it's been a long time since my last post. I've been kinda in hermit-mode, spending more time with the family, and not feeling as productive as I should be with work. I think I'm emerging now. Things are going fairly well, and I have quite a few things I want to write about here.

OT

I also have a few things I want to write about which are more political and less related to free software. I finally did a bit of a brain-dump on my other blog. If you've been following the updates on my boys, then by all means don't miss the bit of improv radio theater we recorded a few days ago.

Life

It's been a busy few weeks - my mom, then my brother and his girlfriend came to visit, and somewhere in the middle of all that we had an Artifex staff meeting.

Things are quieting down now. We had a nice family evening, playing video games and doing a little papercraft. I tried the tiger, just on cheap paper and b/w laser printing, and it came out ok. Of course, Max then wanted to do one of the motorcycles, but I convinced him that we would some other day.

BitTorrent and RSS

There's a thread going around the net on the benefits of combining RSS with BitTorrent. I agree there's something there, but want to make a distinction between the "easy" combination which is quite feasible right now, and one that requires a bit more rocket science (actually, Internet protocol design, but from what I know of both, the latter is more difficult to do well). In the "easy" combination, you have your whole RSS infrastructure exactly as it is now, but use BitTorrent to distribute the "attachments". People have been experimenting with RSS enclosures (for speech, music, video, and whatnot) for a while, but they're not hugely popular yet. One of the reasons is the difficulty and expense of providing download bandwidth for the large files that people will typically want to enclose. BitTorrent can solve that.

In fact, BitTorrent's strengths seem to mesh well with RSS. BT shines when lots of people want to download the same largish file at the same time - it's weaker at providing access to diverse archives with more random patterns of temporal access. Also, BT scales nicely with the number of concurrent downloaders - you get about equally good performance with a dozen or ten thousand. So if someone shoots a really cool digital video, posts it to their blog, then gets Slashdotted, it all still flows.

Integrating BT with a daemon that retrieves RSS feeds in the background has other advantages, as well. If the person opens the file a while after the download begins (which might be as soon as the RSS is updated), most or all of the latency of downloading that file can be hidden. Further, since the BT implementation is released under a near-public domain license, it should be relatively easy for people to integrate it into their blog-browsing applications.

An example of a blog that would work superbly with BT is Chris Lydon's series of interviews.

But Steve Gillmor's article isn't primarily about enclosures - it suggests that we can use BT to manage the RSS feed itself. I think there's something to the idea, but the existing protocol and implementation isn't exactly what's needed. BT is best at downloading large static files. You start with a "torrent" file, which is essentially a Merkle hash tree of the file packaged up with a URL where the "tracker" can be reached. All peers uploading and downloading the file register with the tracker, and get a list of other peers to connect with. Then, peers exchange blocks of the file with each other, using very clever techniques to optimize the overall throughput. After each block is transferred, its hash is checked against what's in the torrent file, and discarded if it doesn't match.

But RSS files themselves are relatively small, so it's unlikely that all that much bandwidth would be saved sending torrent files and running a tracker, as opposed to simply sending the RSS file itself. Further, the big performance problem with RSS is the tradeoff between polling the RSS feed infrequently, resulting in large latencies between the time the feed is updated and viewers get to see it, or polling it frequently and chewing up tons of bandwidth from the server. BT doesn't do much to help with this - you'd be polling the torrent file exactly as frequently as you're polling the RSS file now.

I believe, however, that the BitTorrent protocol could be adapted into one that solves the problem of change notification. The protocol is very smart, and already has much of the infrastructure that's needed. In particular, peers already do notify each other when they receive new blocks. That's not change notification because the contents of the blocks are immutable (and that's enforced by checking the hash), but it's not too hard to see how it could be adapted. At heart, you'd replace the static hash tree of the existing torrent file format with a digital signature. The "publisher" node would then send new digitally signed blocks into the network, where they'd be propagated by the peers. There'd be essentially no network activity in between updates, and, as in the existing BitTorrent protocol, the load on the publisher node would be about the same whether it was feeding a dozen or ten thousand listeners. I'd also expect latency to scale very nicely as well (probably as the log of the number of peers, and with fast propagation along the low latency "backbone" of the peer network).

I'd hate to see such a beautiful work of engineering restricted to just providing RSS feeds - ideally, it would be general enough to handle all sorts of different applications which require change notification. One such is the propagation of RPM or Debian package updates, which obviously has strong requirements for both scaling and robustness. The main thing that's keeping it from happening, I think, is the dearth of people who really understand the BitTorrent protocol.

Proof systems

I've been hacking a bit on my toy proof language. Aside from slowly bringing the verifier up to the point where it checks everything that should be checked, I'm also hacking up an implementation of the HOL inference rules constructed in ZF set theory.

It's immensely satisfying to construct proofs that are correct with high assurance, which is such a contrast from hacking code - any time you write nontrivial code, you know it's got lots of bugs in it, many of which no doubt can be exploited to create security vulnerabilities.

365 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!