26 Apr 2002 raph   » (Master)

Kids

We're going to the Gifted Development Center to get Alan tested. That should be fun.

Thanks to a recommendation from Pat Eyler, I got Kitchen Table Chemistry and Crash and Burn Chemistry for Alan. He absolutely loves them - surprising even me. We also spent some time going over the beautiful Trematode results. The deformed frogs have been a disturbing mystery for a while. It's absolutely wonderful to see science providing some answers. I think we managed to gross Heather out a bit.

There are new pictures up on the homepages of Alan and Max. I recently got a request from the ccsd.ca webmaster to use some of the photos of the kids - it really made my day.

Web Services

I've only been following the Web Services soap opera from a distance, but it looks interesting. Thanks, simonstl, for posting your summaries and links. I think the controversy is a damned good thing, as having things discussed out in the open tends to improve the process of standards bodies immeasurably.

There seem to be deep political issues, as well as deep technical issues, here. REST-vs-SOAP has a distinct David and Goliath flavor to it. From what I can see, Roy Fielding is a person who really knows his stuff, and is a compelling writer to boot. I am glad to see someone of his caliber in the fray.

One of the main technical issues, as I see it, is whether to use URI's to identify the objects involved (REST), or to decouple objects from URI's, so that you have to understand the request to identify the object. That's an interesting distinction.

The other major technical issue is, of course, complexity. SOAP is a fairly complex spec, and the verbosity of the XML is a sign of that complexity. The argument is whether any of this complexity is actually needed. I get the sense that much of the added complexity is motivated by the desire to decouple the request from the server that will actually process it, for example, if there is a "cloud" of servers. This kind of thing seems to be big in the enterprise middleware world.

I get the sense that a big part of the political discussion is whether Web services should live on port 80, which primarily has consequences for firewall policies. Opening SOAP on your firewall is about the same as opening Corba, something that most sane people would never think of doing. If Web Services get to live on port 80, then they'll start out being open by default, with sysadmins scrambling to close the holes, while being subjected to intense pressure from users who just want their Web Service thingies to work. If they get kicked off of port 80, then everything will be off by default, and there will be an uphill battle to get the stuff adopted at all.

I could easily be wrong about this, it's just impressions I'm getting.

Trees

sye: Thanks for the tata link. I haven't looked at it in great detail yet, but I'm not sure it addresses the problems I'm most interested in right now. As I understand it, tree automata are extensions of regular expressions (or, equivalently, finite state machines), to trees. DOM, by contrast, doesn't actually care about the contents of the trees at all, it just gives you a programmatic interface to access them.

David McCusker writes about using navigational paths as node id's. This idea is quite similar to the sliding DOM I was playing with a few years ago. It's very seductive, because it requires no additional storage in the tree, and it works well for the read-only and append-only cases. It's also not too bad when you have a relatively small number of active id's you need to keep track of.

However, I think my latest thinking beats path-based ID's. In many cases, one would expect mutations to the tree to affect only a small number of active node id's. In the read-only and append-only cases, id's are completely stable (as with paths).

One further insight. Assuming you're storing your tree on a disk, the node id's are basically equivalent to file offsets. If you don't move nodes around much on the disk, then your id's will be relatively stable, as well. Thus, analysis of disk traffic speaks directly to the cost of updating the nodes.

In any case, I (fortunately) have a very specific and concrete application for all this abstraction: Fitz. Ghostscript will be the premier client for Fitz. Ghostscript also has a more-or-less append-only pattern of mutations to the tree. It builds up the display list, scanning through the source PostScript or PDF file, then takes another pass to render it. It will be primarily other interactive applications (all of which are quite speculative now) for which efficient processing of random tree mutations is important.

Maintenance of node id's is only one aspect to efficient implementation of a DOM-like tree protocol. The other big one is change notification.

Fitz will implement a Model/View architecture for keeping the screen updated with respect to tree mutations. When you change the tree, Fitz will keep track of which parts of the screen need redrawing, and later redraw just those areas. Again, this mechanism is primarily of interest for interactive applications. In the context of DOM, we're talking about EventListeners for MutationEvents. Note that DOM has serious technical flaws that make it quite painful to implement Model/View, especially when there are multiple Views.

In any case, the major axis is how many different listeners you have attached to the tree. At one extreme, you just have one attached to the root. Any change to the tree calls the associated callback, which schedules the entire tree for redrawing. The simplicity is appealing, but the performance wouldn't be.

At the other extreme (and this is basically what Gill did), you have a listener attached to every graphical object in the tree. When an object changes, the callback schedules just that object for redrawing. This is certainly nice for minimal redraws, but the cost of having all those listeners, and suitably propagating the events up and down, is crushing.

I believe that the Right Thing is to consider the listeners to be a cache. For nodes that have recently changed, you have listeners attached directly to those nodes. When the cache fills, you start evicting listeners for specific nodes, and letting listeners located higher in the tree handle the mutations. Thus, a change to a specific node will, in general, cause a subtree to be redrawn.

There are still some nontrivial issues to be resolved; primarily, how to stop propagation up the tree correctly, so that if a leaf node handles a change, it doesn't trigger a callback to a listener higher up the tree. But I have confidence I can solve these.

In any case, the implementation plan is now a lot clearer. Given the clients I expect, there are probably two simple tree implementations worth doing early. One is heavily optimized for the append-only case, and essentially serializes the tree out to a disk file, using (the moral equivalent of) file offsets as node id's. The other is essentially the same as DOM, using an in-memory object for each node. In the second implementation, node id's are pointers to these objects.

If the API is general enough to handle these two implementations, then I have confidence that it will also handle a third, considerably more complex implementation, based on the btree ideas I've been fantasizing about. This implementation, I believe, would combine the space efficiency of the serialized implementation with the nimble response to change of the node-and-pointer implementation. Thus, I would expect it to be particularly efficient for updates to very large display lists.

It's clear that this API will need to treat node id's as indirect references to the tree, possibly subject to updating, rather than direct pointers to tree nodes. I will probably have a "node id" object, which only has meaning in conjunction with a tree object. The DOM concept of Node can probably be simulated by bundling a node id and a reference to the tree, but I probably won't bother. In any case, the most fundamental departure from DOM is that releasing a node id becomes a significant event.

I am really looking forward to fleshing out this design and then implementing it. I believe I can make a component of very high quality, with lots of useful applications. I'm absolutely thrilled that my paying job lets me hack on this stuff and release it all under GPL.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!