Aside from being a lot of fun and exposing me to new work people are doing, CodeCon gave me the opportunity to have interesting conversations about Vesta and consider how it relates to other projects.
While listening to Walter Landry's presentation on ArX, I compiled a list of a few good things about Vesta which I haven't included in presentations about it before (some stolen directly from Walter's presentation as they apply to Vesta as well):
- Disconnected hacking. You can make a local branch when not connected to the network. [I do this on my laptop when traveling, and this is in fact how I was working when visiting Microsoft Research.]
- Strangers can make changes without the main developers giving permission. With only read access to a central repository, you can make first class branches in your own repository. [This is related to the previous point, in that it all happens locally after replicating whatever version you're basing your changes on.]
- It doesn't use timestamps for anything. Like many modern build systems, Vesta does not use timestamps for dependencies. Unlike most modern competitors, it doesn't depend on timestamps as a versioning system either. Since it is its own filesystem it knows whether you have changed a file.
- It doesn't intermix its data with your source files. Most versioning systems store some meta-data about what you're versioning in files/directories intermixed with your sources. Because Vesta is its own filesystem it can store that meta-data in special directory attributes. This keeps its data out of your way.
I talked with Ross Cohen (who works on Codeville) about merge algorithms. He told me stories of crazy repeated merge cases that he thought would never come up in practice, and then did. I asked him if he'd be offended if I ripped off his merge code for the pluggable merge architecture I've been designing, and he said he wouldn't. (I'm trying to avoid getting into the business of writing a new merge algorithm.)
I talked to Nick Mathewson and Roger Dingledine from the Free Haven project about securing Vesta. They suggested I write an RFC-style protocol spec if I want anyone with a security background to help. They also confirmed my concern that the mastership transfer protocol is the most problematic part, as it uses TCP connections in both directions between the peer repositories. To a lesser extent, replicating from a satellite repository back to a central one when checking in after a remote checkout has the same problem. If we could find a way to make these active connections passive, it would also help people behind firewalls.
Kevin Burton from rojo.com recommended including more information in the RSS feed of latest versions in the pub.vestasys.org repository. I've been having my feed generator trim the checkin comments, but he said the RSS client should do that. He also suggested including a diff in the feed.
Zooko and I talked briefly about merging. Specifically we talked about how
"smart" merge algorithms need more information than a common ancestor and two versions descended from it. They typically take into account more about the steps between the common ancestor and those two versions. I said that I thought it should be possible to use the trigger mechanism I've been working on to record the kind of extra information such algorithms would need. He contended that without spending time using and studying a system with smart merging, I won't know quite how to design to enable it. While I have read through the Darcs "theory of patches", I don't see myself having time to spend really getting a lot of experience with another system.
Andy Iverson (?) and I talked about Vesta's approach of storing complete copies of all versions. He brought up the example of how small a BitKeeper repository is with the entire Linux kernel history. I asked "Is having that entire history really interesting?" He contended that it is, bringing up the example of searching through the history to find when some feature/variable/function was introduced. This probably has to do with the fact that the design decision to store complete copies was made early in Vesta's design (in the late '80s), before open source really took off. Basically the argument was "disk is cheap, programmers aren't typing any faster." However, open source projects can scale to a much larger number of programmers making a much larger number of changes. The Vesta developers couldn't have predicted this effect when they made that design decision. We could do something to add compression of versions, but I'm wary of doing that for performance reasons, at least at the moment. One of Vesta's selling points is O(1) checkout/checkin/branch and access to existing versions, which we would lose with compression. Also, adding compression right now would put load on a central resource (the repository server). I have some ideas on splitting up the system in ways that would make it possible to distribute this load, but I don't expect to make progress on that in the immediate future. Lastly, the Linux kernel clearly has a higher number of contributors (and probably a higher rate of change) than most projects, so maybe this is only an issue in extreme cases. I should probably spend some time measuring the storage requirements for the history of different free software projects.