23 May 2008 (updated 23 May 2008 at 08:06 UTC)
»
This week I've been at UDS in Prague, and looking at some
possible ways to deploy bzr for packaging (which is a hot
topic: developers don't want to change workflows without a
concrete benefit, and definitely don't want to pay a cost
for doing so - e.g. having to have all of history locally
just to make a trivial change).
One of the discussions inspired a scalability test for bzr -
not how we think we'd deploy bzr for Ubuntu developers, just
a test to understand how it would scale *if* we did it this
way.
Lars Wirzenius has a habit
of testing VCS systems capabilities in various ways,
including importing the Debian/Ubuntu source archive into
them. He kindly ran a test using bzr, creating a single
shared repository, with one branch in it per source packages.
This took a few hours to generate (I'm not sure of the exact
figure, we forgot to time it, but it was started in the
afternoon and finished in the morning). The resulting
repository has 21GB in its .bzr/repository/packs directory,
and 500MB in its .bzr/repository/indices directory. There
are 30 pack files, the largest of which is 16GB, and the
smallest a few hundred kB.
In general VCS terms this repository has 16000 heads, 16000
commits (because we didn't import deep archive history).
But what about performance? Its currently copying to a
machine where I can do some serious benchmarks using this
repository. I do have some quick and dirty figures though.
To branch a single package (libyanfs-java) from its branch
within the repository to a new standalone branch with cold
cache took ~5 seconds. Branching again from the repository
now the needed data is in page cache took 0.6 seconds.
Branching from the newly created branch to another new
standalone branch took 0.3 seconds.
There is a clear slowdown occuring here. Including startup
costs the time to make a new branch is doubled by adding the
branch to the repository. However as the repository is 16000
times the size, the scaling factor (2/16000) is pretty darn
good. I'm stoked at this result, as I think it demonstrates
just what the underlying pack store is capable of. We are
working on streamlining the upper layers of bzr to make
better and better use of the underlying store. For instance,
John Meinel
has just done this for 'bzr missing' and 'bzr uncommit'.
Now I must go, time for breakfast!
Woo!