Aside from being a lot of fun and exposing me to new work people are doing, CodeCon gave me the opportunity to have interesting conversations about Vesta and consider how it relates to other projects.
While listening to Walter Landry's presentation on ArX, I compiled a list of a few good things about Vesta which I haven't included in presentations about it before (some stolen directly from Walter's presentation as they apply to Vesta as well):
I talked with Ross Cohen (who works on Codeville) about merge algorithms. He told me stories of crazy repeated merge cases that he thought would never come up in practice, and then did. I asked him if he'd be offended if I ripped off his merge code for the pluggable merge architecture I've been designing, and he said he wouldn't. (I'm trying to avoid getting into the business of writing a new merge algorithm.)
I talked to Nick Mathewson and Roger Dingledine from the Free Haven project about securing Vesta. They suggested I write an RFC-style protocol spec if I want anyone with a security background to help. They also confirmed my concern that the mastership transfer protocol is the most problematic part, as it uses TCP connections in both directions between the peer repositories. To a lesser extent, replicating from a satellite repository back to a central one when checking in after a remote checkout has the same problem. If we could find a way to make these active connections passive, it would also help people behind firewalls.
Kevin Burton from rojo.com recommended including more information in the RSS feed of latest versions in the pub.vestasys.org repository. I've been having my feed generator trim the checkin comments, but he said the RSS client should do that. He also suggested including a diff in the feed.
Zooko and I talked briefly about merging. Specifically we talked about how
"smart" merge algorithms need more information than a common ancestor and two versions descended from it. They typically take into account more about the steps between the common ancestor and those two versions. I said that I thought it should be possible to use the trigger mechanism I've been working on to record the kind of extra information such algorithms would need. He contended that without spending time using and studying a system with smart merging, I won't know quite how to design to enable it. While I have read through the Darcs "theory of patches", I don't see myself having time to spend really getting a lot of experience with another system.
Andy Iverson (?) and I talked about Vesta's approach of storing complete copies of all versions. He brought up the example of how small a BitKeeper repository is with the entire Linux kernel history. I asked "Is having that entire history really interesting?" He contended that it is, bringing up the example of searching through the history to find when some feature/variable/function was introduced. This probably has to do with the fact that the design decision to store complete copies was made early in Vesta's design (in the late '80s), before open source really took off. Basically the argument was "disk is cheap, programmers aren't typing any faster." However, open source projects can scale to a much larger number of programmers making a much larger number of changes. The Vesta developers couldn't have predicted this effect when they made that design decision. We could do something to add compression of versions, but I'm wary of doing that for performance reasons, at least at the moment. One of Vesta's selling points is O(1) checkout/checkin/branch and access to existing versions, which we would lose with compression. Also, adding compression right now would put load on a central resource (the repository server). I have some ideas on splitting up the system in ways that would make it possible to distribute this load, but I don't expect to make progress on that in the immediate future. Lastly, the Linux kernel clearly has a higher number of contributors (and probably a higher rate of change) than most projects, so maybe this is only an issue in extreme cases. I should probably spend some time measuring the storage requirements for the history of different free software projects.
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!