I've been thinking about trees a lot lately, and
epiphany about why: I'm actually going to need a high-tech
implementation for Fitz (the merger of Libart and the
graphics library). This tree will hold the graphics objects
displayed or printed. It's known in the graphics field as a
list". The PDF 1.4 imaging model is inherently tree
it is most natural for it to be a tree.
The goals are simple. I want a nice, abstract interface,
from the implementation details. This interface should
dom-like tree navigation and mutation. Each node will also
bounding box. The most important query on a tree node is to
through all children intersecting a given bounding box.
Also, I want the tree to support a nice Model/View
Mutating the tree (the "model") notifies the view that
changed. The view then queues a region of the screen for
background thread (or, more likely, idle handler) empties
rerendering these regions and drawing them to the screen. To
client, the effect is that the screen updates automagically.
Behind that interface, I want an efficient
criterion is RAM usage, because this will go into things
printers, where RAM is far from free. Using an in-memory
tree node is too heavy, as my experience with both DOM and
Canvas proved. Of course, it also has to be fast, so things
linear scanning through a serialization are out.
I now have a design that I believe will meet these
Here is the
The API is fairly similar to DOM, but with one subtle
difference: instead of Node objects for each tree node, I
plan to use
"node ids". These node ids are transient, even if the node
persistent. When the client is done with a node id, it
go its reference. DOM, by contrast, generally relies on the
garbage collector. I expect that the number of active node
those for which the client holds a reference) is small
compared to the
size of the tree.
Obviously, this API admits a DOM-style implementation.
there is a one-to-one mapping between node id's and node
Admitting such an implementation is a good thing, because it
relatively simple to code up and thus useful as a prototype.
However, the real goal is to admit a more compact
this implementation, a serialization of the tree is held in
Thus, there are two trees. The structure of these two trees
bear any relation to each other, though.
The alphabet of this serialization consists of left
parenthesis, and atoms. Let's simplify things a lot by
as if they are one byte in size. That way, counting is easy,
also don't need to worry about splitting an atom across more
(fixed size) B-tree block. Thus, a complete depth-2 binary
represented by this serialization: "((AB)(CD))"
For each B-tree node, we store a little bit of extra
the parentheses. Starting at 0, incrementing for each '(',
decrementing for each ')', we record the minimum and ending
For example, if our B-tree block size is 4, then our blocks
"((AB", ")(CD", and "))". The summary tuples are (0, 2),
(-1, 0), and
(-2, -2), respectively. These summaries are stored for both
interior B-tree nodes.
Now, there are two notions of node id's, which I
"sliding" and "stable". A "sliding node id" is essentially
of a B-tree (leaf) node identifier and an offset within the
node. Thus, in our example, if the id's of the three B-tree
are x, y, and z, then the root node of our tree is (x, 0),
child is (x, 1), and its second child is (y, 1).
Given a sliding node id, it is pretty easy to see how
methods of first child, next sibling, and previous sibling
implemented efficiently. In particular, you can jump over
B-tree nodes without having to dive into them. The details
are left as
an exercise for the reader.
Finding the parent of a node is a little trickier, but
Basically, you walk down from the root node, and remember
were. If this operation is important, then it probably makes
have a cache. You can also populate this cache when you do a
child" operation, using the result (the child) as the cache
This concept of a sliding node id works well if the tree
or even in the (interesting) special case where you're
the end. But I'm also interested in the general case where
mutating the tree.
The basic idea is to maintain a mapping from stable node
sliding node id's. When the tree is mutated, update this
example, assume that we have a node id pointing to "B". The
node id is (x, 3). Now, let's insert an "E" between "A" and
resulting in the following B-tree leaf nodes: x = "((A", w =
"EB", y =
")(CD", and z = "))". We update the sliding node id in the
stable-to-sliding mapping to (w, 1). Note that the node id
root's second child stays at (y, 1). In fact, we expect most
to remain stable unless that neighborhood of the tree is
This design is quite a bit less ambitious than, say,
IronDoc. I'm not
concerned with updating over a network, doing transactions,
Messaging without email
David McCusker has also been thinking a lot about trees.
provided much of the inspiration for the above design.
I realized recently that he and I are conducting a full
through our respective weblogs. This has traditionally been
I feel like I've found a kindred soul. We've spent a lot
thinking in the same space, and come up with parallel
can also relate to much of the more personal info that David
for example, when we were trying to live with another couple
extended family, we had money issues similar to those that
describes. For us, those ended when we split back into our
I get the feeling that I would enjoy meeting David and
with him. The normal thing to do would be to email him, but
bit reluctant to break the blog-not-email pattern.
Yesterday, Roger Dingledine, who's in town from Boston,
invited me to
go to dinner with him. On the spur of the moment, I hopped
over to San
Francisco (where the cfp2002
conference is being held. It was great fun. In addition to
got to meet a number of people I hadn't seen in a while,
adam and Paul Syverson. I also
got to meet some
knew only online, including Len Sassaman and Lance Cottrell.
meeting George Danezis was a special treat. Bram
by late, but I had to leave to catch Bart by midnight.
These personal connections are vitally important. I'm
was able to do this.
Someone (sorry I don't remember who) recommended Forth.
he likes it, but my experience is otherwise. PostScript is
one of the more powerful Forth dialects (lists and maps as
objects, etc), but I avoid it as much as possible.
It's awesome that projects like Parrot and Psyco are
Unfortunately, I fear that the goal of getting good
impose subtle but deep changes on the languages implemented.
in particular, is radically dynamic, much more so than a
I also think it's great that we have different competing
We don't yet know the Best Way to efficiently implement
languages. My personal guess is that Arc will kick the ass
competitors, by virtue of drawing on the deep knowledge well
efficiently implementing Lisp, but I would be happy to be
My main frustration, as I've written before, is my lack
choices among stable languages. But this will
in time. If there are free software values (in analogy to
programming values), then I'm sure that patience is