14 Feb 2003 cuenca   » (Master)

Trees

I've been reading about David's blobs in more deep these days. I still don't fully understand them, but I think that I understand them well enough to start comparing them to a piece table with a tree.

The two structures are nearly identical. The only difference is on the leafs of the b-tree. From what I've read, it seems that a blob will store the text directly on the leafs, while a piece table will only store a pointer to the real text. This pointer is composed of the buffer (the read-only or the append-only one, see my previous diary entry), the offset and the size of the piece of text.

This extra indirection has several advantages.

First, the number of leafs is no more dependent on the size of the text, but on the number of pieces in your piece table. That's in theory dependent only on the number of changes that the piece table has suffered since the file was first read. In the real world, people likes to (over?) use the pieces to store the format of the text, so the pieces have the additional restriction of being contiguous chunks of text with the same format. But even with this additional restriction, there should be less nodes in the tree.

Second, the text is never, ever, really deleted. I don't find the delete algorithms of blobs, so maybe that's also a blob's feature, but I don't think so (my guess is only funded on the blob's data structure). That may seem at first like bloat. After all, why do you need to keep around even the deleted text? To be able to undo. In fact, the piece table enforces an infinite undo as a trivial feature.

The piece table also shares the strong points of blobs. Copy-on-write, for instance, can also be implemented more or less the same way as with blobs.

In short, I'm maybe missing something, but I still prefer a piece table with a auto-balanced tree than a blob.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!