21 May 2002 raph   » (Master)

Trees, again

A general suite of format/protocol/api for trees such as Athshe is a pretty complicated piece of code. McCusker's blob spec alone, is daunting in its complexity. I'm trying to get a sense of where the complexity is coming from. Obviously, some of it is optimization, for example trying to maximize leaf occupancy, or minimize the number of mutated nodes in a copy-on-write case.

McCusker asks what I mean about network protocols and file formats. His observation that disk access and network latencies are similar is spot-on. I am not really talking about tuning the optimization. My main concern is concurrency. A network protocol must deal with locking and change notification. Transactions would be nice too.

Consider the read-only case first. Concurrency becomes a non-issue. In fact, a perfectly good protocol is to request blocks from a btree file format. If the block size divided by the bandwidth is comparable to the request latency, then you're keeping the connection reasonably full (I'm not worried about small factors).

If you do care, you can hide the latency in a number of ways too, including prefetching. If the client task is to fetch a complete subtree, it can pipeline several requests, hiding almost all latency without the risk of fetching unneeded blocks. However, this requires asynchrony in the client API, a source of complexity. Such is the tradeoff.

I'm not going to go into the details of adding network writes tonight. A simple locking discipline, with caching in the kernel, might work well in a concurrent local disk case. The client drills down from the root every time it accesses the tree, but these requests will usually hit in the cache. If you did something similar over the network, latency could kill you.

One approach is for the server to invalidate blocks (possibly) in the client's cache. The response to a write lock request guarantees that all such invalidations have been sent (we assume message ordering). The bandwidth for the invalidations is trivial compared with the blocks themselves. The main cost is the latency round-trip for the write lock. Also, contention for the write lock could be a serious problem.

A much more sophisticated approach is to optimistically send write requests under the assumption that the relevant blocks are still valid. The server checks, and only commits if so. Invalidations are still useful as a way of improving the chances of a valid commit, but are not needed for correctness. In any case, cleaning up in case the optimism was unwarranted is a challenge.

Cache invalidation is related to change notification, which is the stronger property of actually notifying the application when a node has changed. Change notification is essential for the Model/View paradigm, as it is what triggers recomputation of the View. Doing it generally and efficiently is a hard problem, but one that has been studied in detail. Many new designs include some ad-hoc form of change notification. Polling is very simple and robust, but is not very efficient, especially if you care about notification latency. Thus, you see crude things such as the <cloud> interface in RSS (and OPML). The prospect of solving the problem well is a good motivation for Athshe.

I'll write about these ideas in more detail later.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!