Older blog entries for Xorian (starting at number 10)

CodeCon musings

Aside from being a lot of fun and exposing me to new work people are doing, CodeCon gave me the opportunity to have interesting conversations about Vesta and consider how it relates to other projects.

While listening to Walter Landry's presentation on ArX, I compiled a list of a few good things about Vesta which I haven't included in presentations about it before (some stolen directly from Walter's presentation as they apply to Vesta as well):

  • Disconnected hacking. You can make a local branch when not connected to the network. [I do this on my laptop when traveling, and this is in fact how I was working when visiting Microsoft Research.]
  • Strangers can make changes without the main developers giving permission. With only read access to a central repository, you can make first class branches in your own repository. [This is related to the previous point, in that it all happens locally after replicating whatever version you're basing your changes on.]
  • It doesn't use timestamps for anything. Like many modern build systems, Vesta does not use timestamps for dependencies. Unlike most modern competitors, it doesn't depend on timestamps as a versioning system either. Since it is its own filesystem it knows whether you have changed a file.
  • It doesn't intermix its data with your source files. Most versioning systems store some meta-data about what you're versioning in files/directories intermixed with your sources. Because Vesta is its own filesystem it can store that meta-data in special directory attributes. This keeps its data out of your way.

I talked with Ross Cohen (who works on Codeville) about merge algorithms. He told me stories of crazy repeated merge cases that he thought would never come up in practice, and then did. I asked him if he'd be offended if I ripped off his merge code for the pluggable merge architecture I've been designing, and he said he wouldn't. (I'm trying to avoid getting into the business of writing a new merge algorithm.)

I talked to Nick Mathewson and Roger Dingledine from the Free Haven project about securing Vesta. They suggested I write an RFC-style protocol spec if I want anyone with a security background to help. They also confirmed my concern that the mastership transfer protocol is the most problematic part, as it uses TCP connections in both directions between the peer repositories. To a lesser extent, replicating from a satellite repository back to a central one when checking in after a remote checkout has the same problem. If we could find a way to make these active connections passive, it would also help people behind firewalls.

Kevin Burton from rojo.com recommended including more information in the RSS feed of latest versions in the pub.vestasys.org repository. I've been having my feed generator trim the checkin comments, but he said the RSS client should do that. He also suggested including a diff in the feed.

Zooko and I talked briefly about merging. Specifically we talked about how

"smart" merge algorithms need more information than a common ancestor and two versions descended from it. They typically take into account more about the steps between the common ancestor and those two versions. I said that I thought it should be possible to use the trigger mechanism I've been working on to record the kind of extra information such algorithms would need. He contended that without spending time using and studying a system with smart merging, I won't know quite how to design to enable it. While I have read through the Darcs "theory of patches", I don't see myself having time to spend really getting a lot of experience with another system.

Andy Iverson (?) and I talked about Vesta's approach of storing complete copies of all versions. He brought up the example of how small a BitKeeper repository is with the entire Linux kernel history. I asked "Is having that entire history really interesting?" He contended that it is, bringing up the example of searching through the history to find when some feature/variable/function was introduced. This probably has to do with the fact that the design decision to store complete copies was made early in Vesta's design (in the late '80s), before open source really took off. Basically the argument was "disk is cheap, programmers aren't typing any faster." However, open source projects can scale to a much larger number of programmers making a much larger number of changes. The Vesta developers couldn't have predicted this effect when they made that design decision. We could do something to add compression of versions, but I'm wary of doing that for performance reasons, at least at the moment. One of Vesta's selling points is O(1) checkout/checkin/branch and access to existing versions, which we would lose with compression. Also, adding compression right now would put load on a central resource (the repository server). I have some ideas on splitting up the system in ways that would make it possible to distribute this load, but I don't expect to make progress on that in the immediate future. Lastly, the Linux kernel clearly has a higher number of contributors (and probably a higher rate of change) than most projects, so maybe this is only an issue in extreme cases. I should probably spend some time measuring the storage requirements for the history of different free software projects.

12 Feb 2005 (updated 13 Feb 2005 at 23:51 UTC) »

While at CodeCon, wmf told me about Electric Cloud.

I've been reading through their technology description, and I have to say it sounds like they've just re-implemented a small portion of Vesta. Their dependency detection mechanism sounds almost exactly like Vesta's (and for that matter ClearCase's). The only thing which sounds new to me is that they fire off many build pieces in parallel without knowing whether they're doing them in the right order. If it turns out that they did some build steps in the wrong order, they re-execute those portions. This approach sounds dumb to me. Why not just do it in the right order every time (which is what Vesta does)?

Even worse is that they call their tool "plug compatible with make". So they're working really hard trying to take an old, known broken model and beat it into something scalable without redesigning it at all. Great idea.

Now how to I tell the USPTO that I have prior art for their "patent-pending automatic conflict detection and correction technology"...

On my last day visitng Microsoft Research, I wore my O'Really "Writing Word Macro Viruses" t-shirt. I was highly amused, and nobody at MS seemed to notice.

I feel I should start this entry by saying: I am not making this up.

I've spent the last two days, and I will also spend tomorrow, having what I consider to be a unique and somewhat strange experience. I've been visiting Microsoft research. What's remarkable about this is that I was invited there by employees of Microsoft research to work with them on the free software project I work on. Yes, that's right, Microsoft invited me to visit them so that they could help improve a piece of free software.

But wait, it gets better. This piece of free software is, at least in its current form, UNIX only. (chroot plays a key role in the implementation, and AFAIK Windows has no analogous function.) In other words, Microsoft invited me to visit them so that they could help improve a piece of free software which doesn't even run on Windows.

Now don't worry, the world has not gone mad, there is a logical explanation. The software in question was developed as a research project at the (now defunct) Digital/Compaq Systems Research Center. Some of the original developers (including the team leader), now work at Microsoft research. So really they have just offered to spend a few days helping me understand and improve a few tricky parts of the implementation of something they built before they worked for Microsoft, which just happens to have become free software.

But still, when I actually think about what I've been doing the past two days, it seems rather improbable.

I'm flying from snowy Massachusetts to sunny California tomorrow. After working through the middle of the week, I'll be going to this year's CodeCon, which I'm looking forward to.

Before heading out, I decided to write the code needed to get the Vesta repository mounting on FreeBSD. Luckily, it was far easier than what I had to do for Solaris.

I saw a presentation on Pin yesterday. It's pretty neat.

I've been working on instrumenting multi-threaded server appliactions to do performance analysis for the project I work on, but it's a bit of a pain. It would be nice to be able to dynamically attach the instrumentation to a running server without rebuilding or restarting it. It looks like Pin would be able to do that, though its multi-threading support is still in beta.

Earlier this year at CodeCon 2004, it became evident to me that the lack of a source-only distribution for Vesta was a significant impediment to its acceptance.  Though Vesta is free software, only binary kits have been available because Vesta is a build system that one uses instead of make, and is built using itself (which makes for a classic chicken-and-egg problem).  We simply didn't have any other way to build it.

I decided at that time that it was important to put together make build instructions for Vesta.  I would have liked to get it done six months ago, but as is the way of things, other work constrained me from having enough time to devote to it.  However, there is finally a make-based source kit for Vesta available for download.

I had a great time last weekend at CodeCon 2004. I learned many new and interesting things from the presentations. I enjoyed meeting Bram, raph, Zooko and many others. And, of course, I had fun giving my own little talk.

Random cool things from CodeCon:

I've got to stop reading Advogato diary entries late at night. They keep me up following links and thinking when I should be sleeping!

Last night I came home and read dyork's entry pointing to a discussion about running a personal Wiki on a USB Disk. That got me thinking about the possibility of running a Wiki on my Linux handheld. It could be a very powerful way of both recording notes and keeping them readily accessible. My brain's been spinning on this all day, when I really should be hacking on my main project.

For pretty much my entire professional career, I've had a habit of writing notes as I work. I just use Emacs with a few personal macros. For a while I've wanted to do something that allows me to record a little more semantic content in my notes. (For example, tagging areas of text as source code, program output, etc.) I've hesitated for a couple of reasons:

  • Using a markup system for formatting could be trouble long-term if whatever engine reads it dies off. ASCII text is pretty safe.
  • I've developed my own way of doing this. I'm reluctant to try to adapt my way of working to the paradigm of an existing piece of software. At the same time, I'm not sure I want to invest the time in writing something new that works the way I want.

But maybe I'm underestimating the software that's out there. SnipSnap looks pretty close to what I'd want, but it also looks too hefty to run on my PDA. Maybe I could adapt one of the Python Wiki engines, perhaps even give it a PyQt front end, since I'm running a Qt-based GUI on my PDA.

I guess the bigger problem would be finding a system that would allow me to automatically merge changes made to multiple copies of my journal/knowledge base. (Or extending an existing one to allow this.) I would want to be able to make changes on my PDA or my workstation or my laptop. Ideally, changes made on any one would merge into a "main" data store without any manual effort.

Maybe this is more trouble than it's worth. But on the other hand, all my notes for the last 5+ years take up less than 4M in their current text format. I could easily fit that on a CF card in my PDA, and it would be awfully handy to have them all in my pocket.

There's nothing like burning half a day on something stupid.

Last night I tried to test some changes to a program I hadn't re-built in a while, and when I ran it I got the following unhelpful message:

no such file or directory: /tmp/Verify_Cache

I scratched my head for a bit, then worked my way around to the idea that it could have to do with shared libraries, so I ran ldd on it and got:

/usr/bin/ldd: /tmp/Verify_Cache: No such file or directory

Some time later, a co-worker suggested using the LD_DEBUG environment variable. I tried that, and got the same error message as when I tried to run it the first time.

Much head scratching later, I started to look at the contents of the binaries with various other inspection tools. That's when I noticed that while "readelf --program-headers" on working binaries showed:

[Requesting program interpreter: /lib/ld-linux.so.2]

On my broken one it showed:

[Requesting program interpreter: /usr/lib/libc.so.1]

Which doesn't exist on any of my reasonably modern Linux systems. I made that a symlink to /lib/ld-linux.so.2 and my broken binary suddenly worked.

It turns out that the build instructions had gotten a little stale, and this binary had "-static" on its link line but also had shared objects on its link line. Garbage in garbage out, I guess. Still, it seems like there should be a big "attention jackass: you told me to do something stupid so I did" warning in this case.

P.S. I'm wearing my "It must be user error" t-shirt today. I obviously picked the right one, I just didn't know it was referring to me.

1 older entry...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!