25 Dec 2003 raph   » (Master)

Life

It's been a busy few weeks - my mom, then my brother and his girlfriend came to visit, and somewhere in the middle of all that we had an Artifex staff meeting.

Things are quieting down now. We had a nice family evening, playing video games and doing a little papercraft. I tried the tiger, just on cheap paper and b/w laser printing, and it came out ok. Of course, Max then wanted to do one of the motorcycles, but I convinced him that we would some other day.

BitTorrent and RSS

There's a thread going around the net on the benefits of combining RSS with BitTorrent. I agree there's something there, but want to make a distinction between the "easy" combination which is quite feasible right now, and one that requires a bit more rocket science (actually, Internet protocol design, but from what I know of both, the latter is more difficult to do well). In the "easy" combination, you have your whole RSS infrastructure exactly as it is now, but use BitTorrent to distribute the "attachments". People have been experimenting with RSS enclosures (for speech, music, video, and whatnot) for a while, but they're not hugely popular yet. One of the reasons is the difficulty and expense of providing download bandwidth for the large files that people will typically want to enclose. BitTorrent can solve that.

In fact, BitTorrent's strengths seem to mesh well with RSS. BT shines when lots of people want to download the same largish file at the same time - it's weaker at providing access to diverse archives with more random patterns of temporal access. Also, BT scales nicely with the number of concurrent downloaders - you get about equally good performance with a dozen or ten thousand. So if someone shoots a really cool digital video, posts it to their blog, then gets Slashdotted, it all still flows.

Integrating BT with a daemon that retrieves RSS feeds in the background has other advantages, as well. If the person opens the file a while after the download begins (which might be as soon as the RSS is updated), most or all of the latency of downloading that file can be hidden. Further, since the BT implementation is released under a near-public domain license, it should be relatively easy for people to integrate it into their blog-browsing applications.

An example of a blog that would work superbly with BT is Chris Lydon's series of interviews.

But Steve Gillmor's article isn't primarily about enclosures - it suggests that we can use BT to manage the RSS feed itself. I think there's something to the idea, but the existing protocol and implementation isn't exactly what's needed. BT is best at downloading large static files. You start with a "torrent" file, which is essentially a Merkle hash tree of the file packaged up with a URL where the "tracker" can be reached. All peers uploading and downloading the file register with the tracker, and get a list of other peers to connect with. Then, peers exchange blocks of the file with each other, using very clever techniques to optimize the overall throughput. After each block is transferred, its hash is checked against what's in the torrent file, and discarded if it doesn't match.

But RSS files themselves are relatively small, so it's unlikely that all that much bandwidth would be saved sending torrent files and running a tracker, as opposed to simply sending the RSS file itself. Further, the big performance problem with RSS is the tradeoff between polling the RSS feed infrequently, resulting in large latencies between the time the feed is updated and viewers get to see it, or polling it frequently and chewing up tons of bandwidth from the server. BT doesn't do much to help with this - you'd be polling the torrent file exactly as frequently as you're polling the RSS file now.

I believe, however, that the BitTorrent protocol could be adapted into one that solves the problem of change notification. The protocol is very smart, and already has much of the infrastructure that's needed. In particular, peers already do notify each other when they receive new blocks. That's not change notification because the contents of the blocks are immutable (and that's enforced by checking the hash), but it's not too hard to see how it could be adapted. At heart, you'd replace the static hash tree of the existing torrent file format with a digital signature. The "publisher" node would then send new digitally signed blocks into the network, where they'd be propagated by the peers. There'd be essentially no network activity in between updates, and, as in the existing BitTorrent protocol, the load on the publisher node would be about the same whether it was feeding a dozen or ten thousand listeners. I'd also expect latency to scale very nicely as well (probably as the log of the number of peers, and with fast propagation along the low latency "backbone" of the peer network).

I'd hate to see such a beautiful work of engineering restricted to just providing RSS feeds - ideally, it would be general enough to handle all sorts of different applications which require change notification. One such is the propagation of RPM or Debian package updates, which obviously has strong requirements for both scaling and robustness. The main thing that's keeping it from happening, I think, is the dearth of people who really understand the BitTorrent protocol.

Proof systems

I've been hacking a bit on my toy proof language. Aside from slowly bringing the verifier up to the point where it checks everything that should be checked, I'm also hacking up an implementation of the HOL inference rules constructed in ZF set theory.

It's immensely satisfying to construct proofs that are correct with high assurance, which is such a contrast from hacking code - any time you write nontrivial code, you know it's got lots of bugs in it, many of which no doubt can be exploited to create security vulnerabilities.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!