10 Nov 2004 titus   » (Journeyer)

Tuple spaces

It's good to see tuple spaces gaining some exposure; Patrick Logan mirrored my instinctive reaction to the Amazon Queue service beta in saying that he wished they'd provided a tuple space implementation (notwithstanding the ease of building a tuple space on top of a queue, yes.)

I first ran across tuple spaces when I implemented one without really knowing that it was a tuple space. My batchqueue implementation for Cartwheel is based on the tuple space concept, although it's more like a queue the way it's implemented. Essentially, "producers" (usually the Cartwheel Web site, nicknamed 'canal') dump job requests ("tuples") into a PostgreSQL 'request' table. "Consumers", queue processing programs running on compute nodes, monitor the table for new requests and extract a new request when one is available. Results are returned to the database and linked to the request table.

When I developed the first implementation of Cartwheel, the main goal was to avoid executing "os.system" calls from the Web server. At the time I was using AOLserver/PyWX, a high-performance threaded Web server running my/our Python embedding, and it seemed like a bad idea to do os.system calls from within a threaded app! A side benefit of implementing the queue processing as a tuple space on top of PostgreSQL was that jobs could be distributed across multiple computers. Now, it's a major feature of the thing ;). (And, since I've switched to Quixote/SCGI, os.system still seems like a bad idea but it's less of an issue.)

While my tuple space implementation on top of PostgreSQL isn't well suited for speedy turnaround (typically picking up a job requires up to 1 second), it was absolutely trivial to implement: literally, something like 5 lines of code. You can see it in my pyzine article (search for "tuple_space.add"). Once you add comments, and error handling so that e.g. CTRL-C returns the job to the tuple space rather than giving up on it, and some simple reporting functions, it adds up to a couple hundred lines of code. All in all, I'd stack tuple spaces up against any other parallel processing technique for simplicity of implementation.

One recurring idea has been to reimplement Google's MapReduce technique on top of Cartwheel (or some other system) to produce a highly scalable system for whole-genome motif searching. Naturally, the first thing to do is to come up with a name for the system: that's much more important than an implementation! I've been thinking of "Motiefer", along the lines of my FamilyJewels project. (So much less obvious than some dumb acronym like "ParMotSear"... but hmm, "SAR" would be kind of amusing. We'll see.)

Huh. Well, I was going to write something specific about Python for the purpose of proving to Ryan Phillips that this blog should be on PlanetPython, but ... I guess I did. OK.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!