29 May 2002 raph   » (Master)

Linked

I'm reading Linked: The New Science of Networks, by Barabasi. I'll have quite a bit more to say when I'm finished reading it, but in the meantime, if you're interested in networks, I can highly recommend it. In particular, if you're trying to do p2p, then run, do not walk, to your friendly local independent bookstore.

Advogato and community

anselm: you raise good questions. With luck, Advogato can become more vital without sacrificing its thoughtful tone.

In any case, I think the secret to success in online community is for it to be firmly based on real, human community. That's sometimes tricky in the highly geographically dispersed world of free software, but worth cultivating.

Why Athshe will not be based on XML

Athshe is a (completely vaporware) suite of tools for tree-structured data, including an API for tree access, a simple serialized file format, a more sophisiticated btree-based random access file format, and a network protocol for remote tree access and update. Everybody is doing trees these days, because XML is hot. Yet, I do not plan to base Athshe on XML. Why?

In short, because it makes sense for Athshe to work with a simpler, lower level data language. The simplicity is a huge win because Athshe will take on quite a bit of complexity trying to optimize performance and concurrency (transactions). Also, XML is weak at doing associative arrays, and I think Athshe should be strong there. Lastly, the goals of Athshe are focussed on performance, which is a bit of an impedance mismatch with most of XML community.

I hardly feel I have to justify the goal of simplicity - it's such an obvious win. So I won't.

The children of an XML element are arranged in a sequence. However, in a filesystem, the children of a directory are named; the directory is an associative array mapping names to child inodes. I believe this to be an incredibly useful and powerful primitive to expose. Many applications can use associative arrays to good advantage if they are available. One example is mod_virgule, which leverages the associative array provided by the filesystem to store the acct/<username> information.

XML (or, more precisely, DOM) actually does contain associative array nodes. Unfortunately, these are Attribute nodes, so their children are constrained to be leaves. So you get the complexity without the advantages :)

A very common technique is to simulate an associative array by using a sequence of key/value pairs. This is essentially the same concept as the property list from Lisp lore. XPath even defines syntax for doing plist lookup, for example child::para[attribute::type="warning"] selects all para children of the context node that have a type attribute with value warning. However, mandating this simulation has obvious performance problems, and may also make it harder to get the desired semantics on concurrent updates. In particular, two updates to different keys in an associative array may not interfere, but the two updates to the plist simulation very likely will.

Nonetheless, this concept of simulation is very powerful. I believe it is the answer to the "impedance mismatch" referenced above. Athshe's language consists of only strings, lists, associative arrays, and simple metadata tags. It's not at all hard to imagine mapping "baby XML" into this language. In Python syntax, you'd end up with something like:

<p>A <a href="#foo">link</a> and some <b>bold</b> text.</p>
becomes:
['p', {}, 'A ', ['a', {'href': '#foo'}, 'link'], ' and some ', ['b', {}, 'bold'], ' text.']

With a little more imagination, you could get DTD, attributes, entities, processing instructions, and CDATA sections in there.

In fact, mappings like this are a common pattern in computer science. They resemble the use of an intermediate language (or virtual machine) in compilers. These types of mappings also tend to be interfaces or boundaries between communities. Often, the lower-level side has a more quantitative, performance-oriented approach, while the higher-level side is more concerned with abstraction. Cisco giveth, and the W3C taketh away :)

Credit where credit is due

My recent piece on link encryption drew deeply on an IRC conversation with Roger Dingledine and Bram, and Zooko provided the OCB link.

A good application for an attack-resistant trust metric

A best-seller on Amazon.

Note especially how the shill reviews have very high "x out of y people found the following review helpful" ratings. A perfect example of a system which is not attack resistant. Reminds me of the /. moderation system :)

Thanks to Will Cox for the link.

Ghostscript

I've been busy hacking on Well Tempered Screening. I've got it random-access, and I've coded up the enumeration so you can use PostScript code to define the spot function. I still have to grapple with the "device color" internals, which intimidates me.

On a parallel track, the DeviceN code tree, which up to now has lived in a private branch, is shaping up fairly nicely, at least using regression testing as a metric. We should be able to get it checked into HEAD soon.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!