Older blog entries for robertc (starting at number 76)

thank you lazyweb; A number of folk have written to me pointing out Netem. One in particular, Yusuf Goolamabbas even provided a set of wrapper scripts for Netem that I'm going to be digging into next week.

Netem is built around various lower level tools like tc, which is good (tc is what I was using year ago). I'm hopeful it will be really easy to use, and will blog something when I've used it in anger :)

Dear lazyweb,

In bzr development we are now working primarily on network performance. One of the key things about being sure we have improved things is automated, repeatable benchmarks. And for that to be useful in networking environments we need to control latency and bandwidth and packet loss.

I know this isn't a new problem, but it was about 5 years ago that I last did this sort of thing. What are the best tools today (for linux :)). Ideally I'd be able to bring up a bunch of local addresses like 127.0.1.1 or 127.0.0.2, with different properties - such that traffic from 127.0.0.1 to 127.0.0.2 will simulate being uploaded over adsl, and 127.0.0.2 to 127.0.0.1 will simulate being downloaded over adsl.

Flying to the US for UDS in Boston... and this time no dreaded AAAA's. Seems the US doesn't hate me quite so much, much more pleasant this time. Still, being fingerprinted for entering a country is rather irritating, back home we only do that to criminals.

When are two identical changes the same, and when aren't they? Theres a little bit of debate started by Andrew Cowie posting about unmixing the paint. Matt Palmer followed up with a claim that a particular technique used by Andrew is dangerous, and finally Andrew Bennetts makes the point that text conflicts are a small subset of merge conflicts.

That said, one critical task for a version control system is the merge command. Lets define merge at a human level as "reproduce the changes made in branch A in my branch B". There are a lot of taste choices that can be made without breaking this definition. For instance, merge that combines all the individual changes into one - losing the individual commit deltas meets this description. So does a merge which requires all text conflicts to be resolved during the merge commands execution, or one that does not give a human a chance to review the merged tree before recording it as a commit.

So if the goal of merge is to reproduce these other changes, then we are essentially trying to infer what the *change* was. For example, in an ideal world, merging a branch that changes all "log messages of floating points to 6 digit scale." would know enough to catch all new log messages added in my branch, regardless of language, actual api used etc etc. But that is fantasy at the moment. The best we can do today depends on how we capture the change. For instance, Darcs allows some changes to be captured as symbol changing patches, and others as regular textual diffs.

So the problem about whether arriving at the same result can be rephrased 'when is arriving at the same result correct or incorrect'.

For instance, if I write a patch and put it up as plain text on a website, then two people developing $foo download it and apply it, they have duplicate changes but its clearly correct that a merge between them should not error on this.

On the other hand, the example Andrew Bennetts quotes in his post is a valid example of two people making the same change, but the line needing a change during the merge to remain correct.

Here's another, example though. If I commit something faulty to my branch, and you pull from me before I fix it. Then while I fix the bug, you also fix it - the same way. That is another example of no-conflict being correct.

If its possible for either answer - conflict, or do not conflict - to be correct, then what should a VCS author do?

There are several choices here:

  • Always conflict
  • Never conflict conflict
  • Conflict based on a heuristic

I think that our job is to assess what the maximum harm from choosing the wrong default is, and the likely hood of that occuring, and then make a choice. Short of fantasy no merge is, in general, definately good or bad - your QA process (such as an automatic test suite) needs to run regardless of the VCS's logic. The risk of a bad merge is relatively low, because you should be testing, and if the merge is wrong you can just not commit it, or roll it back. So our job in merge is to make it likely as possible that your test suite will pass when you have done the merge, without further human work. This is very different to trying to always conflict whenever we cannot be 100% sure that the text is what a human would have created. Its actually harder to take this approach than conflicting - conflicting is easy.

So we're here in sunny Vilnius sprinting on bzr. I thought I'd write up some of what we've achieved.

On Thursday Wouter got most of the nested-trees branch merged up with bzr.dev, but about 500 tests failing ;). Jelmer has introduced a new parameter to the sprout api call on BzrDir called 'limit_to_revisions' which if supplied is a set of revisions which forms a strict limit on the amount of data to be fetched. This is intended to be the API by which the desire for limiting history - to achieve shallow branches/history horizons - is communicated at the 'bzr branch' or 'bzr checkout' stage. This API is tested so everything passes, but nothing chooses to implement it yet. I was hacking on commit (somewhat indirectly) by working on the repository-wide indexing logic we will require if we wish to move to a more free-form database with semi-arbitrary keys mapping to opaque data (at the lowest level). I got a core GraphIndex up at this point. Once complete this should drop the amount of I/O we perform during commit by half. There was some other hacking going on too - Tim Hatch did some tweaks to bzr-hg, Jelmer some bzr-svn, and so on.

On Friday we carried on on the same basic targets: Wouter fixed the 500+ failing tests so now its only in the 5-10 range, and has been debugging them one by one since. Jelmer implemented the limit_to_revisions parameter all the way down to the knit level, for non-delta compressed knits (e.g. revisions.knit), which made knit1 repositories support the parameter and got the 'supports' branches of the interface tests being executed. I developed CombinedGraphIndex, and have sent the low level Index stuff up for review, and followed that up by implementing a KnitIndex implementation that layers on the GraphIndex API to allow the use of .knit files without .kndx. This new index API allows us to switch out the indexing code and format much more easily than previously - the knit index logic is decoupled from the generic indexing facilities. It also allows us to record arbitrary-parent compression chains. Finally Jelmer implemented the wishlist item for bzr-gtk of a notification area widget that can be used to enable bzr-avahi watching, and share commits with the local LAN/get notified of commits on the local LAN.

Today is the last day of the sprint, and I hope that we'll get the nested tree branch passing all tests, the limit_to_revisions parameter causing delta-compressed knits to only copy data outside the listed parameters to the first full text, and have an experimental repository format that uses the new index layer for at least a couple of its knits (e.g. revisions and inventory, or revisions and signatures).

Ran into an interesting thing a couple of days ago. It looks like mercurial suffers from a race condition wherein a size-preserving change made within the same second as the last hg command's completion (or possibly the last time hg observed the file) will be ignored by hg.

This isn't particularly bad for regular human users (as long as you don't have your editor open while running commands in the background), but its pretty harsh for scripting users - size preserving changes are not *that* uncommon.

I'm immensely glad that we don't have this race within bzr (even on FAT filesystems where you only get last-modified to within 2 seconds!)

Just spent some time bringing bzr-avahi up to play nice with current bzr. This gives it integration with bzr-dbus (and thus bzr lan-notify) and bzr commit-notify (in the bzr-gtk plugin)

I want to know when we will get interesting talks like this happening! Competing on the basis of speed.

28 Mar 2007 (updated 28 Mar 2007 at 09:10 UTC) »

I got an invite to mugshot today... but apparently *it* cannot accept the terms of use.

Error message

23 Mar 2007 (updated 23 Mar 2007 at 13:06 UTC) »

I've finally hooked up bzr-dbus to bzr-gtk: When a local branch changes its HEAD, that notifies a background task 'commit-notifier' (when the branch can be read by that task). You need to be running 'bzr commit-notify' and have bzr-dbus correctly installed (see its README). Its only just working now, so will be tweaked and tuned a bit to have the commit-notify command started automatically by GNOME, have it read remote branches (not done right now to avoid issues with needing a login box).

One nice thing is that this will notify on 'pull' commands too, so when a bzr branch or pull command completes, you get a notification.

67 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!