CodeCon musings
Aside from being a lot of fun and exposing me to new work people
are doing, CodeCon gave me the opportunity to have interesting
conversations about Vesta and
consider how it relates to other projects.
While listening to Walter Landry's
presentation on ArX, I compiled a
list of a few good things about Vesta which I haven't included in
presentations about it before (some stolen directly from Walter's
presentation as they apply to Vesta as well):
- Disconnected hacking. You can make a local
branch when not connected to the network. [I do this on
my laptop when traveling, and this is in fact how I was
working when visiting
Microsoft Research.]
- Strangers can make changes without the main
developers giving permission. With only read access to
a central repository, you can make first class branches in
your own repository. [This is related to the previous
point, in that it all happens locally after replicating
whatever version you're basing your changes on.]
- It doesn't use timestamps for anything. Like many modern
build systems, Vesta does not use timestamps for dependencies. Unlike
most modern competitors, it doesn't depend on timestamps as a
versioning system either. Since it is its own filesystem it knows
whether you have changed a file.
- It doesn't intermix its data with your source files. Most
versioning systems store some meta-data about what you're versioning in
files/directories intermixed with your sources. Because Vesta is its
own filesystem it can store that meta-data in special directory
attributes. This keeps its data out of your way.
I talked with Ross Cohen (who works on Codeville) about merge algorithms.
He told me stories of crazy repeated merge cases that he thought would
never come up in practice, and then did. I asked him if he'd be
offended if I ripped off his merge code for the
pluggable merge architecture I've been designing, and he said he
wouldn't. (I'm trying to avoid getting into the business of writing a
new merge algorithm.)
I talked to Nick
Mathewson and Roger
Dingledine from the Free Haven
project about securing Vesta. They suggested I write an RFC-style
protocol spec if I want anyone with a security background to
help. They also confirmed my concern that the mastership transfer
protocol is the most problematic part, as it uses TCP connections in
both directions between the peer repositories. To a lesser extent,
replicating from a satellite repository back to a central one when
checking in after a remote checkout has the same problem. If we could
find a way to make these active connections passive, it would also
help people behind firewalls.
Kevin
Burton from rojo.com recommended
including more information in the RSS feed of
latest versions in the pub.vestasys.org repository. I've been
having my feed generator trim the checkin comments, but he said the
RSS client should do that. He also suggested including a diff in the
feed.
Zooko and I
talked briefly about merging. Specifically we talked about how
"smart" merge algorithms need more information than a common ancestor
and two versions descended from it. They typically take into account
more about the steps between the common ancestor and those two
versions. I said that I thought it should be possible to use the
trigger mechanism I've been working on to record the kind of extra
information such algorithms would need. He contended that without
spending time using and studying a system with smart merging, I won't
know quite how to design to enable it. While I have read through the
Darcs "theory
of patches", I don't see myself having time to spend really
getting a lot of experience with another system.
Andy Iverson (?) and I talked about Vesta's approach of storing
complete copies of all versions. He brought up the example of how
small a BitKeeper repository is with the entire Linux kernel
history. I asked "Is having that entire history really interesting?"
He contended that it is, bringing up the example of searching through
the history to find when some feature/variable/function was
introduced. This probably has to do with the fact that the design
decision to store complete copies was made early in Vesta's design (in
the late '80s), before open source really took off. Basically the
argument was "disk is cheap, programmers aren't typing any faster."
However, open source projects can scale to a much larger number of
programmers making a much larger number of changes. The Vesta
developers couldn't have predicted this effect when they made that
design decision. We could do something to add compression of
versions, but I'm wary of doing that for performance reasons, at least
at the moment. One of Vesta's selling points is O(1)
checkout/checkin/branch and access to existing versions, which we
would lose with compression. Also, adding compression right now would
put load on a central resource (the repository server). I have some
ideas on splitting up the system in ways that would make it possible
to distribute this load, but I don't expect to make progress on that
in the immediate future. Lastly, the Linux kernel clearly has a
higher number of contributors (and probably a higher rate of change)
than most projects, so maybe this is only an issue in extreme cases.
I should probably spend some time measuring the storage requirements
for the history of different free software projects.