Older blog entries for raph (starting at number 187)

20 May 2002 (updated 20 May 2002 at 20:18 UTC) »
Doctor Mommy

Heather's PhD graduation ceremony was today. She has two of three signatures, and still needs to submit an archival copy, but otherwise is basically done. I am so proud of her.

The valedictory speech by Daniel Hernandez (also known as the editor of the Daily Cal) was very good. Prof. Christ's speech was also good. Prof. Adelman's speech was not bad, but had a somewhat defensive tone regarding the importance of English vs other more ostensibly practical pursuits, including a potshot at computer science.

Heather's father and youngest sister were there, as were the kids and I. Also, Marty and Jan. Jan seemed offended that I shook her hand in parting (as opposed to a hug). Oh well.

Low tech trust vs. high tech trust

I had an interesting discussion with Roger Dingledine and Luke (hacker-name armitage) over lunch at ETCon about the importance of manual input into attack-resistant trust metrics. Advogato uses the explicit,

[shit - this part of my entry got deleted - anyone have a cached version? Add "retrieve previous version" to m/ wishlist.]

Outline follows:

Trust metrics need manual input. Advogato uses manually-entered trust links. Google uses hyperlinks, which are usually manually entered.

"Low tech trust" is a pattern of manual information flow across trust links. Example: links to recommended content, represented by "hot lists" in the early days of the Web, blogrolling today. You can find a well recommended blog by starting with one you know, and clicking blogroll links. Google's PageRank calculates something similar.

Example: personal music recommendations. Automated recommendation systems (Firefly, Amazon) don't work as well as just getting recommendations from friends.

Both examples are attack-resistant, borrowing the analysis from trust metrics. In blog rolls, trust graph is explicit. In music recommendations, trust graph is implicit in flow of recommendations (they only flow between friends).

Low tech trust is good. Automated computation is harder, but opens interesting doors. Example: manual clicking on links is good at finding high quality random site, but Google is good at finding high quality site relevant to search term (and quickly, too). Killer apps for trust metrics will probably follow the same pattern.


Why am I so fascinated by trees?

One of my pie-in-the-sky projects is Athshe, a somewhat nebulous data format for trees. The writeup isn't that good, apologies. In any case, I've been kicking the ideas around in my head for quite some time now, and I still think it's worthwhile.

I'll probably write about some of the more detailed pieces, but I want to get some sense of the large scale across this time.

I think formats for tree-structured data have many intertwingled aspects. It might help to consider XML and filesystems as two special cases.

Abstract tree language

This is basically the language of what's allowed. Obviously, there are trees. The children of a node may be primarily sequential (as in XML), or primarily mapped by name (as in filesystems). What's in a leaf is also important - Unicode text in XML, arbitrary binary in filesystems.

It's good for the abstract tree language to be simple. Often, if you want more complex stuff, you find you can map it into simpler types fairly easily.

Athshe has exactly four types in the language: list, map, bytes, and tag (similar to MIME Content-Type). It's an appealing set, but I'm not sure it's quite right. In particular, I get the feeling that references (perhaps analogous to the # part of URL's) may be important too.


Once you have a tree in hand, you want to serialize it out as a file, so you can store it in a file or move it over the net as one chunk. Also, because parsing is so well understood, it's often more useful to talk about the serialization than the abstract tree language.

For a given tree language, there may be many different serializations. In the case of filesystems, there's tar, pax, zip, and many others. XML is mostly just XML, but there's stuff like wbxml too. ASN.1 is another example of an abstract tree language with many different encodings.

Access API

If you want to actually process trees, you'll need an API to access the tree data. In the case of XML, the API of choice is DOM. In the case of filesystems, it's the traditional Unix open/read/write/close API, or a variation.

The access API has many implications for performance. For one, grain size is important. For another, it's important for nodes to be very, very cheap, but active node references (open file descriptors) can be much less cheap. DOM doesn't quite get this right.

Another important API consideration is how calls map across the process boundary. For files, most calls cross the process boundary, which generally implies that files are not good for very small grains. However, note that fread() and fwrite() manage to cache a goodly fraction of file accesses within the process.

This is a good place for complexity to set in. There are all kinds of important issues in real systems only incidentally related to the abstract data language, such as asynchrony (nonblocking I/O), integrity (see fsync), access control, and so on. Applications may also want generalized undo (versioning), copy-on-write, transactions, and more.

Finally, the access API is a reasonably good place for change notification to happen. Unix has been traditionally deficient in this area; it's a standard feature in NT. In the case of XML, DOM has it, but with flaws.


It used to be enough to access a tree local to disk or memory. Now, you usually want to do it over the network too. For this, you need a protocol.

Protocols for remote access to filesystems are a very rich, fertile, and increasingly mature area. People use this stuff heavily, though not quite yet universally. Performance is important, and there are lots of clever ideas. A major performance goal is to cut down on the number of network boundary crossings.

In the XML world, the most important protocol is HTTP, which basically gives you the entire tree as one chunk. You can also run DOM through over CORBA, as described in the DOM spec, but the performace would be horrible. There's no good general way to access a large tree incrementally.

Random access file format

Filesystem trees typically don't get manipulated as tarfiles. Usually, the filesystem implementation stores the tree on disk in some random access format. Only a small part needs to be in RAM, and it's pretty quick to get to an arbitrary place in the tree. Again, this is a mature area. In the case of filesystems, it's almost a solved problem.

XML doesn't really have a random access file format. There are such things as "hierarchical databases", but they are not popular.

The main idea of Athshe is that a simple abstract data language makes the problem of building a high-performance suite of tools more tractable. The major performance theme is scalability. Operations on a small local tree should be almost as fast as a pure in-memory implementation. On the other end of the scale, it should be possible to access huge remote trees efficiently. Chunk size in your process and network boundary crossings should match the chunk size of the application; in particular, traversing a complete subtree should resemble transferring it as a serialization.

I find it very difficult to communicate why this is such a good thing. But see a recent thread on wmf's discussion page, which reminded me of my Athshe ideas.


AaronSw's ETCon schedule with weblog links is useful and interesting. One day soon, most conferences will be annotated like this, probably in realish time.

17 May 2002 (updated 17 May 2002 at 23:07 UTC) »

I'll start by responding to a few threads here, namely simonstl on ASN.1, and sej on EPS/PDF.


simonstl recently brought up ASN.1 as a possible alternative to XML. It's worth looking at, if for no other reason than to be aware of history so you don't have to repeat it. The advantages are clear: it is much more compact than XML for binary-oriented data, and it is quite mature. However, the associated schema language isn't especially nice (schema language design is a very hard problem), and it carries quite a bit of historical baggage.

In particular, implementations have tended to be horrible. In the olden days, most implementations were "compilers" that autogenerated C code to marshal and unmarshal ASN.1 data in and out of C structures. This mostly-static approach loses a lot of flexibility, of course. Code quality is another issue. I am unaware of any general ASN.1 tool that approaches the quality of open source XML tools such as DV's excellent libxml (although it's possible things have improved since I last looked).

Another major piece of baggage is X.509, a truly horrid data format for public key certificates using ASN.1 as the binary syntax. Most sane people who have encountered it run away screaming. See Peter Gutmann's x.509 style guide for a more detailed critique.

So it's worth looking at ASN.1, but it's certainly no magic bullet.

Incidentally, ASN.1 is one of many attempts to define low-level binary data formats for tree structured data. I hear good things about bxxp/beep, but haven't studied it carefully enough to critique it. Off the top of my head, there's sexp, which is the binary data format for SDSI (the name reflects its kinship with Lisp S-expressions). Further, most RPC mechanisms define binary encodings. Don't forget IIOP (Corba), XDR (Sun RPC), and so on. I'm sure all have advantages and disadvantages.

What exactly are the goals? As I see it, the main problems with XML are bloat (for general data; XML bloat in marked up text is quite acceptable) and complexity (XML is only medium-bad here). Binary formats help a lot with the former, and may or may not with the latter. But there are a lot of other considerations, including:

  • Quality of implementations (XML wins big).

  • Associated schema languages (messy; importance varies).

  • Suitability for dynamic mutation (XML's DOM is a reasonable, if inefficient, API for local tree manipulation; most other formats don't have anything similar, and their more rigid nature would make it harder).

  • Scaling. XML itself doesn't scale particularly well to huge trees, but is usually implemented in conjunction with URI's, which do. A binary data format can either help or hinder this goal.

Update: ASCII vs Binary, by David Reed.

Transparency, PostScript, PDF

sej brings up the desire to handle vector objects with transparency. Basically, PostScript can't do it, and probably never will. PDF 1.4, on the other hand, has very rich transparency capabilities. It's a bit more complex, and caution is needed when dealing with a standard controlled by a highly proprietary organization such as Adobe, but it has compelling technical advantages.

Here are two signs that the rest of the world is moving to PDF: pdflatex appears to be far more vital than dvips. Mac OS X uses PDF for the imaging and printing metafile.


I went to the O'Reilly Emerging Technology Conference (formerly known as the P2P conference) on Wednesday. It was great fun. I met a lot of people, some of whom I've known for years (such as Jim McCoy). Others I've known online, but met for the first time in person (Kevin Burton, Aaron Swartz, Dave Winer).

I particularly enjoyed hanging out with Jim, Roger Dingledine, Bram Cohen, Zooko, and Wes Felter. It is such a luxury to be able to interact with people on such a deep level. In my opinion, the best people to associate with are those you can teach and learn from. Thanks all of you for making my Wednesday a day of intense teaching and learning.

A lot of people have heard of my work, and are reading my thesis. I think Roger is largely to blame for this; a number of people mentioned hearing about it from him. I had great conversations with the MusicBrainz and nocatauth folks, both of which would be killer applications for attack-resistance.

I have this uneasy feeling there's something important that I left out. Also, one risk of naming names is that people might infer something from a name being left out. Not to worry - I genuinely enjoyed meeting everybody I met at ETCon.


Reading David McCusker's log fills me with sadness. The personal troubles he's going through resonate deeply for me, as I've been through similar things. I'm in a much happier space now, for which I consider myself very fortunate.

His technical work is also quite compelling. When we meet, one of the main things I want to talk about is whether to join forces on some of our pie-in-the-sky dream projects. Collaboration can be a nightmare when the other person doesn't understand what you're trying to do, or particularly share your goals. I get the feeling that working with him might be otherwise.

David talks about being busy and not having much time for things. I can certainly relate to that. He does, of course, have time to meet with me, whether he realizes it or not. Other things will of course slide, so it's worth considering their importance in the long run.

11 May 2002 (updated 11 May 2002 at 23:00 UTC) »

I've been thinking quite a bit about runtimes recently. It's an interesting topic in general, but I also have a more specific interest: I'm designing Fitz, a next-generation 2D graphics library, and I want this library to be as useful as possible. So I want to avoid gratuitous runtime incompatibility.

Bertrand Meyer's Polyglot Programming is a very insightful discussion of some of the consequences of Microsoft's Common Language Runtime (CLR). Here's a particularly interesting quote:

The language openness of .NET is a welcome relief after the years of incessant Java attempts at language hegemony. For far too long, the Sun camp has preached the One Language doctrine. The field of programming language design has a long, rich history, and there is no credible argument that the alpha and omega of programming, closing off any future evolution, was uttered in Silicon Valley in 1995. Microsoft's .NET breaks this lock.

Absolutely. What the CLR provides instead is a runtime hegemony. It seems to be a fairly high quality attempt, and is well on its way to achieving huge market share. The idea of a runtime hegemony provides almost all the advantages of a language hegemony, with the added advantage of providing a much easier migration path for components written in non-native languages to be assimilated into the runtime. Do note the echoes between Bertrand's discussion of libraries and Paul Graham's in Being Popular.

It remains to be seen, I believe, whether it's worth writing new CLR code in any language other than C#. It's a nice enough language, and you have the advantage of not having to worry about the language-to-runtime mapping Bertrand mentions towards the end of his essay. But other languages might turn out to provide compelling advantages over C# for this environment, and the CLR will, as Bertrand argues, support them.

There is now a flowering of thought, design, and implementation of runtimes on the free software side. This world is resistant to hegemony. Instead of one dominant runtime, we will see lots of different approaches. In the short term, this is a serious disadvantage compared with CLR, because it doesn't provide a good story for people (such as myself) who just want to write software. In the long term, I think, it could lead to much stronger working knowledge about how to knit together systems out of disparate components.

I'll list here a few selected projects I find interesting, with some comments focussed on runtime and integration.

  • CPython. The CPython runtime is carefully layered on top of the C runtime, supporting the highly dynamic Python, while allowing full access to the wealth of C libraries. In fact, CPython + C can be seen as an "aggregate language", actually a fairly compelling platform. Other language implementations, such as Ruby, fall into this class as well.

  • Pyrex. Pyrex is essentially a hybrid language with syntax and semantics fairly similar to Python, but able to call C directly. Thus, we essentially have a 3-language aggregate, consisting of CPython + Pyrex + C.

  • Lisp. Lisp is a mature language, with mature, high quality implementations. Most Lisp implementations provide a Foreign Function Interface (FFI), which is capable of directly calling C. However, the details of the FFI tend to differ from implementation to implementation, so you can't really have a Lisp + C aggregate, but rather a family of CMUCL + C, AutoLisp + C, etc. See Design Issues for Foreign Function Interfaces for a much deeper discussion. Also note Arc, which could well become the most compelling Lisp dialect.

  • Parrot. See this interview with Dan Sugalski for much more info on why Parrot is interesting.

  • Ruby/Python. This project attempts to knit Ruby and Python together. I'm including it largely because it illustrates some technical difficulties in integrating runtimes: the last posted version integrates Python 1.5 (which uses reference counting) with Ruby (which is fully gc'ed). It's relatively straightforward to integrate N runtimes when only one is gc'ed - it is the master, and all others are slaves. It's far harder to have more than one, and Ruby/Python seems not to have grappled with the problem yet.

  • JVM (Java) + JNI. This is still a reasonable approach, but free implementations are still not very good. They will get there in time. Note also Jython, which knits the Python language into the JVM runtime.

  • Swig, an "interface language" for easy integration of C and scripting languages. One of the most fascinating aspects of Swig is that a single interface can generate wrappers for many different scripting languages.

  • Corba. Most people these days use Corba to autogenerate low quality network protocols, layered on top of IIOP. However, one of the original goals of Corba was to facilitate integration between a number different languages. It is a mature standard, and fairly well implemented even in the free world, but is fairly painful in practice.

  • Mono and DotGNU, free clones of Microsoft's CLR.

These projects have widely varying goals and ambitions. Some will no doubt succeed, others fail. I expect our community to learn a lot in the process. And perhaps, some day, programmers can sit down and write code without facing difficult decisions about which runtimes to be compatible with, and which to be incompatible with.

Update: Stacking them up: a Comparison of Virtual Machines by John Gough. A well-written, technical comparison of JVM and CLR.

A poem on awakening

Twelve people emerge from sleep camp
Untroubled by dreams
Completely oblivious

3 May 2002 (updated 4 May 2002 at 00:45 UTC) »

I think it was Zooko who pointed me to Kademlia. It's good. In particular, it's a lot less fragile than Chord because of their XOR metric.

Of course, it's not attack resistant yet. I'm sending mail to them to point to Chapter 7 of my thesis, which is actually very similar in a lot of ways to what they're doing. My problem is that I haven't gotten my ideas published yet.


I see more and more recognition that email is becoming unusable because of spam. The recognition is a very good thing, largely because it funds and motivates research into spam-resistant messaging. There's now a ton of work on the problem, exploring many different directions. Also encouraging is the fact that both free software and commercial products and services are well represented.

Making email resistant to spam is a hard problem. I believe that attack resistant trust is a very good approach. In particular, I believe that the stamp trading network in Chapter 7 of my thesis is likely to do a good job filtering out the spam without trimming the legitimate messages. Of course, the only way to test this hypothesis is to build a prototype. I don't have the time right now to do so myself, and so far I haven't been able to interest anyone else in it either.

Until quite recently, I'd believed that email would have to be completely replaced by a better system. The existing email infrastructure, of course, not only lacks any kind of attack resistance, but also lacks any form of authentication, which is needed in the next level down. Thus, I reasoned, to solve these problems, you have to build something new.

But a few days ago, in a conversation with Jeff Darcy, I realized that this is not necessarily the case. Highly sophisticated email filtering clients will, almost inevitably, implement a p2p network with good trust properties. Vipul's Razor is already a significant step in the right direction.

Such a client could then easily implement a real attack resistant trust metric. If both the sender and receiver clients participate in compatible p2p networks, then approval for the message could be expedited. Otherwise, the message would have to take its chances.

Assuming that this trust-based message approval works well, then in time it will probably get a lot of penetration. When that happens, you could probably just turn off untrusted smtp delivery.

But how do you know in advance whether a trust system will work well for email? Building it and fielding it in the existing email infrastructure is really hard. Plus, there are lots of different ideas on how to do this (my stamp trading network is but one). How do you know which one is best?

To me, the answer is clear: build a prototype. Such a prototype would not have to work with the existing email infrastructure, or attract a huge user base (just large enough to test whether it is spam resistant). Once the prototype proves the ideas, then the difficult work of integrating into the existing email network can begin.

Subpixel positioning

I have a hacked up version of Ghostscript that does unhinted antialiasing with subpixel positioning. See before and after screenshots.

Now that I've posted the screenshots, I'll have to make the patch really work. I have code for making the cache subpixel-aware, but it's not 100% yet (the "after" screenshot was made with caching disabled). Also, I'll need to add a few configurable parameters, especially to turn off hinting by default in aa mode (right now, it's compile time).


Denver was fun. We went up to Winter Park on Monday, and the kids got to play in the snow a bit.

Alan tested, as we expected, highly gifted. Less expected was the large discrepancy between verbal skills (off the scale) and visual/spatial skills (mazes and the like), at which he is basically average. They recommended an optometry consult. That will be interesting.

When we got back, I think Max made his first pun. He said, "miss spider", which either meant that he missed his favorite spider book, or that he was referring to more of the title words. Of course, it's possible I'm just reading the pun into what he said, but even so I wouldn't be surprised if it were for real. All the people around him love language and delight in puns, and he's a quick study.


Desmond Tutu: Apartheid in the Holy Land. This is one of the sanest things I've read, about an insane conflict.


David McCusker posted a very nice maze recently. I solved it this morning, and it took a gratifyingly long time - I'm used to solving mazes in my head.

Someone posted a link to Olin Shivers's maze page. Olin's and David's mazes are both mazes, but there the similarity ends. David's maze, hand-drawn, has a distinctly organic flavor. More than that, it has an architecture. The different locations in the maze are places. In Olin's mazes, they're just grid coordinates.

In any case, David posted a challenge to me, to come up with an image processing algorithm to represent images in the line widths. I think the right algorithm is Ostromoukhov's Artistic Screening. I don't have an implementation handy, but I faked something similar in the Gimp. The results weren't that satisfying - I was hoping for more crispness in the image. Here's the source image, for reference.


I fly tomorrow to meet the kids in Denver. Kinda surprising how much I miss them after only a couple of days apart.

I'm happy these days. I'll write up and post a bit about my philosophy of "humble elitism" which I think is partly responsible. An example will do for now: plain old elitism is reading Slashdot, being irritated at how stupid the comments are, and feeling all superior. Humble elitism is reading better blogs, such as Hack the Planet.


I spent a good part of the day going over potential references in my queue. Here are my conclusions:

Sybil attack

The author presents a bad trust metric, shows that it is not attack resistant, and seems to imply that centralized indentity service (a la VeriSign) is needed. Obviously, I don't agree. He gets a brief mention in Chapter 1.

Self-Organization and Identification of Web Communities

Interesting. Their community-finding algorithm is very similar to Advogato's, so much so that it really feels like convergent evolution. However, they're not trying to make it attack resistant, and, indeed, it's not. I wrote a page or so at the end of Chapter 3 describing the differences and their effects on attack resistance.

Poblano (Jxta)

The paper they have up there is white, but with writing on the pages (see alancoxonachip). They present a trust metric in fairly vague terms, but there's no reason to believe it's any good. No cite.

The new text is, as always, available in the thesis snapshot.

27 Apr 2002 (updated 27 Apr 2002 at 13:24 UTC) »

Robert Leslie is upset because Vorbis doesn't have a published spec yet.

He definitely has a point. Vorbis needs a spec. But his attitude (echoed by a number of people posting comments) is troublesome to an extreme. Monty and crew have made a tremendous gift to the world by doing Vorbis. This guy seems to believe that somehow obligates xiph.org to hand him a polished spec on a silver platter.

The real problem is that writing specs takes time and energy. For patented technologies, there's plenty of financial incentive for corporations to fund that work. For work done as a pure public good, the incentive disappears.

The xiph people are doing their work as a labor of love. They deserve our moral support back. Robert Leslie should take an active role in the process of writing a Vorbis spec, on an unpaid volunteer basis. If he does, he will be participating in free software the way it is meant to be. If not, he's just whining.


David McCusker's note on dict btrees is entertaining reading, as always. As he suggests, it's not directly relevant to my Fitz needs. But I appreciate him writing it anyway, largely because I'm still fairly ignorant when it comes to known database practice.

The idea of unique, stable node id's is interesting. If the tree is stored on a disk, then you maintain the mapping from id's to file offsets as an additional invariant, for example by using the node id as a btree key to the actual node contents. The big advantage is that clients don't have to worry about a node id ever changing. Also, letting go of a node reference doesn't have to be a significant event. Thus, implementing DOM-compatible semantics is easier.

I don't think I'll use this idea for Fitz, though. In most databases, you're generally willing to make the server a lot more complex in order to make life easier for clients. I'm not dealing with any such asymmetry. Maybe there will be a little more work to keep the keys for the partially rendered fragment cache updated, but it doesn't scare me too much. The Fitz client interface will be subject to reference counting discipline, so I can easily keep track of all "active" node references from the client.

But of course none of this is set in stone yet, and I could easily change my mind. The part that's least well thought through is the mutation event propagation. Anything that makes that easier counts as a compelling argument.


Dave Winer posted a screenshot (from Tim Bray) of Chimera, a Web browser for Mac OS X that uses the native Quartz fonts.

There are few notable things about the font rendering. Obviously, the fonts are antialiased. Less well known is the fact that Quartz font rendering tends to be completely unhinted. This screenshot also demonstrates a fanatical degree of subpixel precision. Obviously, subpixel precision is important for quality, but I'm not convinced that taking it to this extreme is a worthwhile tradeoff. My experiments show that 1/4 pixel horizontal positioning is indistinguishable from "perfect" precision unless you're looking reallyhard. Quantizing to quarter pixels certainly makes glyph caching more effective. So that's almost certainly what I'm going to do for Ghostscript.

I've looked at OS X font rendering on several monitors. The 15" CRT in Alan's iMac is acceptable but far from great. I run it in 1024x768, but the default is 800x600. One of the odd things about CRT's is that cranking up the resolution actually decreases contrast if you don't scale the input signal (and most people don't). The 14" 1024x768 LCD on this ThinkPad 600 is also ok, but not great. Rendering looks a little globby and uneven.

However, on my 15" 1600x1200 ThinkPad A22p, this screenshot looks really, really good. Running OS X on that hardware would be very compelling. Too bad you can't buy an Apple with a screen that good.

Part of the reason the screenshot looks so good is the choice of font - Lucida Grande is very robust for low-res antialiased rendering. In my experience, most fonts in the general ballpark of Frutiger will have this property. I chose antialiased Frutiger for my first GUI about 15 years ago. Hopefully, I'll be able to use it for my main desktop (laptop?) reasonably soon.


Considerable progress on Chapter 7 (the stamp trading network design). Update: the latest thesis snapshot contains a reasonably fleshed-out design for the stamp trading network; the chapter is now 9 pages. Not much in the way of analysis yet, though.

I also found out that the easiest way to get the Bluesky TeX fonts is to use pdflatex instead of dvips. The latter uses bitmap fonts, which are a lose. The main hitch is that pdflatex can't handle included eps files. However, it handles included pdf's just fine. Fortunately, epstopdf (which is part of the tetex-bin package on debian, and is a wrapper around Ghostscript) converted my eps's without difficulty.

So it looks like I'll be abandoning PostScript, and going straight to PDF. It's amusing that it's taken me until now to figure this out.

26 Apr 2002 (updated 26 Apr 2002 at 01:36 UTC) »

We're going to the Gifted Development Center to get Alan tested. That should be fun.

Thanks to a recommendation from Pat Eyler, I got Kitchen Table Chemistry and Crash and Burn Chemistry for Alan. He absolutely loves them - surprising even me. We also spent some time going over the beautiful Trematode results. The deformed frogs have been a disturbing mystery for a while. It's absolutely wonderful to see science providing some answers. I think we managed to gross Heather out a bit.

There are new pictures up on the homepages of Alan and Max. I recently got a request from the ccsd.ca webmaster to use some of the photos of the kids - it really made my day.

Web Services

I've only been following the Web Services soap opera from a distance, but it looks interesting. Thanks, simonstl, for posting your summaries and links. I think the controversy is a damned good thing, as having things discussed out in the open tends to improve the process of standards bodies immeasurably.

There seem to be deep political issues, as well as deep technical issues, here. REST-vs-SOAP has a distinct David and Goliath flavor to it. From what I can see, Roy Fielding is a person who really knows his stuff, and is a compelling writer to boot. I am glad to see someone of his caliber in the fray.

One of the main technical issues, as I see it, is whether to use URI's to identify the objects involved (REST), or to decouple objects from URI's, so that you have to understand the request to identify the object. That's an interesting distinction.

The other major technical issue is, of course, complexity. SOAP is a fairly complex spec, and the verbosity of the XML is a sign of that complexity. The argument is whether any of this complexity is actually needed. I get the sense that much of the added complexity is motivated by the desire to decouple the request from the server that will actually process it, for example, if there is a "cloud" of servers. This kind of thing seems to be big in the enterprise middleware world.

I get the sense that a big part of the political discussion is whether Web services should live on port 80, which primarily has consequences for firewall policies. Opening SOAP on your firewall is about the same as opening Corba, something that most sane people would never think of doing. If Web Services get to live on port 80, then they'll start out being open by default, with sysadmins scrambling to close the holes, while being subjected to intense pressure from users who just want their Web Service thingies to work. If they get kicked off of port 80, then everything will be off by default, and there will be an uphill battle to get the stuff adopted at all.

I could easily be wrong about this, it's just impressions I'm getting.


sye: Thanks for the tata link. I haven't looked at it in great detail yet, but I'm not sure it addresses the problems I'm most interested in right now. As I understand it, tree automata are extensions of regular expressions (or, equivalently, finite state machines), to trees. DOM, by contrast, doesn't actually care about the contents of the trees at all, it just gives you a programmatic interface to access them.

David McCusker writes about using navigational paths as node id's. This idea is quite similar to the sliding DOM I was playing with a few years ago. It's very seductive, because it requires no additional storage in the tree, and it works well for the read-only and append-only cases. It's also not too bad when you have a relatively small number of active id's you need to keep track of.

However, I think my latest thinking beats path-based ID's. In many cases, one would expect mutations to the tree to affect only a small number of active node id's. In the read-only and append-only cases, id's are completely stable (as with paths).

One further insight. Assuming you're storing your tree on a disk, the node id's are basically equivalent to file offsets. If you don't move nodes around much on the disk, then your id's will be relatively stable, as well. Thus, analysis of disk traffic speaks directly to the cost of updating the nodes.

In any case, I (fortunately) have a very specific and concrete application for all this abstraction: Fitz. Ghostscript will be the premier client for Fitz. Ghostscript also has a more-or-less append-only pattern of mutations to the tree. It builds up the display list, scanning through the source PostScript or PDF file, then takes another pass to render it. It will be primarily other interactive applications (all of which are quite speculative now) for which efficient processing of random tree mutations is important.

Maintenance of node id's is only one aspect to efficient implementation of a DOM-like tree protocol. The other big one is change notification.

Fitz will implement a Model/View architecture for keeping the screen updated with respect to tree mutations. When you change the tree, Fitz will keep track of which parts of the screen need redrawing, and later redraw just those areas. Again, this mechanism is primarily of interest for interactive applications. In the context of DOM, we're talking about EventListeners for MutationEvents. Note that DOM has serious technical flaws that make it quite painful to implement Model/View, especially when there are multiple Views.

In any case, the major axis is how many different listeners you have attached to the tree. At one extreme, you just have one attached to the root. Any change to the tree calls the associated callback, which schedules the entire tree for redrawing. The simplicity is appealing, but the performance wouldn't be.

At the other extreme (and this is basically what Gill did), you have a listener attached to every graphical object in the tree. When an object changes, the callback schedules just that object for redrawing. This is certainly nice for minimal redraws, but the cost of having all those listeners, and suitably propagating the events up and down, is crushing.

I believe that the Right Thing is to consider the listeners to be a cache. For nodes that have recently changed, you have listeners attached directly to those nodes. When the cache fills, you start evicting listeners for specific nodes, and letting listeners located higher in the tree handle the mutations. Thus, a change to a specific node will, in general, cause a subtree to be redrawn.

There are still some nontrivial issues to be resolved; primarily, how to stop propagation up the tree correctly, so that if a leaf node handles a change, it doesn't trigger a callback to a listener higher up the tree. But I have confidence I can solve these.

In any case, the implementation plan is now a lot clearer. Given the clients I expect, there are probably two simple tree implementations worth doing early. One is heavily optimized for the append-only case, and essentially serializes the tree out to a disk file, using (the moral equivalent of) file offsets as node id's. The other is essentially the same as DOM, using an in-memory object for each node. In the second implementation, node id's are pointers to these objects.

If the API is general enough to handle these two implementations, then I have confidence that it will also handle a third, considerably more complex implementation, based on the btree ideas I've been fantasizing about. This implementation, I believe, would combine the space efficiency of the serialized implementation with the nimble response to change of the node-and-pointer implementation. Thus, I would expect it to be particularly efficient for updates to very large display lists.

It's clear that this API will need to treat node id's as indirect references to the tree, possibly subject to updating, rather than direct pointers to tree nodes. I will probably have a "node id" object, which only has meaning in conjunction with a tree object. The DOM concept of Node can probably be simulated by bundling a node id and a reference to the tree, but I probably won't bother. In any case, the most fundamental departure from DOM is that releasing a node id becomes a significant event.

I am really looking forward to fleshing out this design and then implementing it. I believe I can make a component of very high quality, with lots of useful applications. I'm absolutely thrilled that my paying job lets me hack on this stuff and release it all under GPL.

20 Apr 2002 (updated 22 Apr 2002 at 17:49 UTC) »

I've been thinking about trees a lot lately, and yesterday I had an epiphany about why: I'm actually going to need a high-tech tree implementation for Fitz (the merger of Libart and the Ghostscript graphics library). This tree will hold the graphics objects to be displayed or printed. It's known in the graphics field as a "display list". The PDF 1.4 imaging model is inherently tree structured, so it is most natural for it to be a tree.

The goals are simple. I want a nice, abstract interface, insulated from the implementation details. This interface should support dom-like tree navigation and mutation. Each node will also have a bounding box. The most important query on a tree node is to iterate through all children intersecting a given bounding box.

Also, I want the tree to support a nice Model/View architecture. Mutating the tree (the "model") notifies the view that something has changed. The view then queues a region of the screen for redisplay. A background thread (or, more likely, idle handler) empties this queue, rerendering these regions and drawing them to the screen. To the client, the effect is that the screen updates automagically.

Behind that interface, I want an efficient implementation. The main criterion is RAM usage, because this will go into things like printers, where RAM is far from free. Using an in-memory object per tree node is too heavy, as my experience with both DOM and the Gnome Canvas proved. Of course, it also has to be fast, so things like linear scanning through a serialization are out.

I now have a design that I believe will meet these goals. Here is the broad outline.

The API is fairly similar to DOM, but with one subtle and important difference: instead of Node objects for each tree node, I plan to use "node ids". These node ids are transient, even if the node itself is persistent. When the client is done with a node id, it explicitly lets go its reference. DOM, by contrast, generally relies on the Java garbage collector. I expect that the number of active node ids (ie, those for which the client holds a reference) is small compared to the size of the tree.

Obviously, this API admits a DOM-style implementation. In this case, there is a one-to-one mapping between node id's and node objects. Admitting such an implementation is a good thing, because it should be relatively simple to code up and thus useful as a prototype.

However, the real goal is to admit a more compact implementation. In this implementation, a serialization of the tree is held in a B-tree. Thus, there are two trees. The structure of these two trees need not bear any relation to each other, though.

The alphabet of this serialization consists of left parenthesis, right parenthesis, and atoms. Let's simplify things a lot by treating atoms as if they are one byte in size. That way, counting is easy, and we also don't need to worry about splitting an atom across more than one (fixed size) B-tree block. Thus, a complete depth-2 binary tree is represented by this serialization: "((AB)(CD))"

For each B-tree node, we store a little bit of extra information about the parentheses. Starting at 0, incrementing for each '(', and decrementing for each ')', we record the minimum and ending values. For example, if our B-tree block size is 4, then our blocks are "((AB", ")(CD", and "))". The summary tuples are (0, 2), (-1, 0), and (-2, -2), respectively. These summaries are stored for both leaf and interior B-tree nodes.

Now, there are two notions of node id's, which I will christen "sliding" and "stable". A "sliding node id" is essentially the tuple of a B-tree (leaf) node identifier and an offset within the node. Thus, in our example, if the id's of the three B-tree leaf nodes are x, y, and z, then the root node of our tree is (x, 0), its first child is (x, 1), and its second child is (y, 1).

Given a sliding node id, it is pretty easy to see how the standard DOM methods of first child, next sibling, and previous sibling can be implemented efficiently. In particular, you can jump over irrelevant B-tree nodes without having to dive into them. The details are left as an exercise for the reader.

Finding the parent of a node is a little trickier, but only a little. Basically, you walk down from the root node, and remember where you were. If this operation is important, then it probably makes sense to have a cache. You can also populate this cache when you do a "first child" operation, using the result (the child) as the cache key.

This concept of a sliding node id works well if the tree is read-only, or even in the (interesting) special case where you're appending to the end. But I'm also interested in the general case where you are mutating the tree.

The basic idea is to maintain a mapping from stable node id's to sliding node id's. When the tree is mutated, update this mapping. For example, assume that we have a node id pointing to "B". The sliding node id is (x, 3). Now, let's insert an "E" between "A" and "B", resulting in the following B-tree leaf nodes: x = "((A", w = "EB", y = ")(CD", and z = "))". We update the sliding node id in the stable-to-sliding mapping to (w, 1). Note that the node id for the root's second child stays at (y, 1). In fact, we expect most node id's to remain stable unless that neighborhood of the tree is being mutated.

This design is quite a bit less ambitious than, say, IronDoc. I'm not concerned with updating over a network, doing transactions, or generalized undo.

Messaging without email

David McCusker has also been thinking a lot about trees. Reading over his site has provided much of the inspiration for the above design.

I realized recently that he and I are conducting a full scale dialog through our respective weblogs. This has traditionally been the domain of email.

I feel like I've found a kindred soul. We've spent a lot of time thinking in the same space, and come up with parallel conclusions. I can also relate to much of the more personal info that David blogs - for example, when we were trying to live with another couple in an extended family, we had money issues similar to those that David describes. For us, those ended when we split back into our respective nuclear families.

I get the feeling that I would enjoy meeting David and hanging out with him. The normal thing to do would be to email him, but I'm a bit reluctant to break the blog-not-email pattern.


Yesterday, Roger Dingledine, who's in town from Boston, invited me to go to dinner with him. On the spur of the moment, I hopped over to San Francisco (where the cfp2002 conference is being held. It was great fun. In addition to Roger, I got to meet a number of people I hadn't seen in a while, including adam and Paul Syverson. I also got to meet some people I knew only online, including Len Sassaman and Lance Cottrell. Also, meeting George Danezis was a special treat. Bram stopped by late, but I had to leave to catch Bart by midnight.

These personal connections are vitally important. I'm happy that I was able to do this.


Someone (sorry I don't remember who) recommended Forth. I'm happy that he likes it, but my experience is otherwise. PostScript is actually one of the more powerful Forth dialects (lists and maps as first-class objects, etc), but I avoid it as much as possible.

It's awesome that projects like Parrot and Psyco are underway. Unfortunately, I fear that the goal of getting good performance may impose subtle but deep changes on the languages implemented. Python, in particular, is radically dynamic, much more so than a typical Lisp.

I also think it's great that we have different competing approaches. We don't yet know the Best Way to efficiently implement highly dynamic languages. My personal guess is that Arc will kick the ass of its competitors, by virtue of drawing on the deep knowledge well of efficiently implementing Lisp, but I would be happy to be proved wrong.

My main frustration, as I've written before, is my lack of good choices among stable languages. But this will inevitably happen in time. If there are free software values (in analogy to the extreme programming values), then I'm sure that patience is among them.

178 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!