Older blog entries for robertc (starting at number 78)

23 Jan 2008 (updated 23 Jan 2008 at 17:48 UTC) »

Tracing python programs. Today, Evan Dandrea asked a general question "Where is set -x for python". A quick google for sys.settrace found: Some code snippets. I thought this was nice, but surely you want to be able to just trace an arbitrary program. So I present a 'quick hack' (5 minutes precisely :)) to do that based on the previous links final version:


#!/usr/bin/env python


import linecache import os import os.path import sys

def traceit(frame, event, arg): if event == "line": lineno = frame.f_lineno filename = frame.f_globals["__file__"] if (filename.endswith(".pyc") or filename.endswith(".pyo")): filename = filename[:-1] name = frame.f_globals["__name__"] line = linecache.getline(filename, lineno) print "%s:%s: %s" % (name, lineno, line.rstrip()) return traceit

def main(): search_path = os.environ.get('PATH', '').split(os.path.pathsep) argv = sys.argv[1:] if not argv: raise Exception("No command to trace supplied") args = argv[1:] command = argv[0] if os.path.sep not in command: for path in search_path: if os.path.exists(os.path.join(path, command)): command = os.path.join(path, command) break del sys.argv[0] source = open(command, 'rt') exec_symbols = dict(globals()) exec_symbols['__name__'] = '__main__' sys.settrace(traceit) exec source in exec_symbols, exec_symbols

main()

I send mail from my laptop by a local smarthost-with-auth install of exim4. Recently I got motivated to setup smtp submission port for this, as I got tired of borked hotel wifi intercepting smtp, and was behind a firewall that allowed no smtp out...

It was pretty simple - on my mail server, enable listening on port 587 by putting: 'daemon_smtp_ports = smtp : 587' in before the local_interfaces line.

And on my laptop, edit the 'remote_smtp_smarthost' stanza to add 'port = 587'.

Yay to less mail headaches.

thank you lazyweb; A number of folk have written to me pointing out Netem. One in particular, Yusuf Goolamabbas even provided a set of wrapper scripts for Netem that I'm going to be digging into next week.

Netem is built around various lower level tools like tc, which is good (tc is what I was using year ago). I'm hopeful it will be really easy to use, and will blog something when I've used it in anger :)

Dear lazyweb,

In bzr development we are now working primarily on network performance. One of the key things about being sure we have improved things is automated, repeatable benchmarks. And for that to be useful in networking environments we need to control latency and bandwidth and packet loss.

I know this isn't a new problem, but it was about 5 years ago that I last did this sort of thing. What are the best tools today (for linux :)). Ideally I'd be able to bring up a bunch of local addresses like 127.0.1.1 or 127.0.0.2, with different properties - such that traffic from 127.0.0.1 to 127.0.0.2 will simulate being uploaded over adsl, and 127.0.0.2 to 127.0.0.1 will simulate being downloaded over adsl.

Flying to the US for UDS in Boston... and this time no dreaded AAAA's. Seems the US doesn't hate me quite so much, much more pleasant this time. Still, being fingerprinted for entering a country is rather irritating, back home we only do that to criminals.

When are two identical changes the same, and when aren't they? Theres a little bit of debate started by Andrew Cowie posting about unmixing the paint. Matt Palmer followed up with a claim that a particular technique used by Andrew is dangerous, and finally Andrew Bennetts makes the point that text conflicts are a small subset of merge conflicts.

That said, one critical task for a version control system is the merge command. Lets define merge at a human level as "reproduce the changes made in branch A in my branch B". There are a lot of taste choices that can be made without breaking this definition. For instance, merge that combines all the individual changes into one - losing the individual commit deltas meets this description. So does a merge which requires all text conflicts to be resolved during the merge commands execution, or one that does not give a human a chance to review the merged tree before recording it as a commit.

So if the goal of merge is to reproduce these other changes, then we are essentially trying to infer what the *change* was. For example, in an ideal world, merging a branch that changes all "log messages of floating points to 6 digit scale." would know enough to catch all new log messages added in my branch, regardless of language, actual api used etc etc. But that is fantasy at the moment. The best we can do today depends on how we capture the change. For instance, Darcs allows some changes to be captured as symbol changing patches, and others as regular textual diffs.

So the problem about whether arriving at the same result can be rephrased 'when is arriving at the same result correct or incorrect'.

For instance, if I write a patch and put it up as plain text on a website, then two people developing $foo download it and apply it, they have duplicate changes but its clearly correct that a merge between them should not error on this.

On the other hand, the example Andrew Bennetts quotes in his post is a valid example of two people making the same change, but the line needing a change during the merge to remain correct.

Here's another, example though. If I commit something faulty to my branch, and you pull from me before I fix it. Then while I fix the bug, you also fix it - the same way. That is another example of no-conflict being correct.

If its possible for either answer - conflict, or do not conflict - to be correct, then what should a VCS author do?

There are several choices here:

  • Always conflict
  • Never conflict conflict
  • Conflict based on a heuristic

I think that our job is to assess what the maximum harm from choosing the wrong default is, and the likely hood of that occuring, and then make a choice. Short of fantasy no merge is, in general, definately good or bad - your QA process (such as an automatic test suite) needs to run regardless of the VCS's logic. The risk of a bad merge is relatively low, because you should be testing, and if the merge is wrong you can just not commit it, or roll it back. So our job in merge is to make it likely as possible that your test suite will pass when you have done the merge, without further human work. This is very different to trying to always conflict whenever we cannot be 100% sure that the text is what a human would have created. Its actually harder to take this approach than conflicting - conflicting is easy.

So we're here in sunny Vilnius sprinting on bzr. I thought I'd write up some of what we've achieved.

On Thursday Wouter got most of the nested-trees branch merged up with bzr.dev, but about 500 tests failing ;). Jelmer has introduced a new parameter to the sprout api call on BzrDir called 'limit_to_revisions' which if supplied is a set of revisions which forms a strict limit on the amount of data to be fetched. This is intended to be the API by which the desire for limiting history - to achieve shallow branches/history horizons - is communicated at the 'bzr branch' or 'bzr checkout' stage. This API is tested so everything passes, but nothing chooses to implement it yet. I was hacking on commit (somewhat indirectly) by working on the repository-wide indexing logic we will require if we wish to move to a more free-form database with semi-arbitrary keys mapping to opaque data (at the lowest level). I got a core GraphIndex up at this point. Once complete this should drop the amount of I/O we perform during commit by half. There was some other hacking going on too - Tim Hatch did some tweaks to bzr-hg, Jelmer some bzr-svn, and so on.

On Friday we carried on on the same basic targets: Wouter fixed the 500+ failing tests so now its only in the 5-10 range, and has been debugging them one by one since. Jelmer implemented the limit_to_revisions parameter all the way down to the knit level, for non-delta compressed knits (e.g. revisions.knit), which made knit1 repositories support the parameter and got the 'supports' branches of the interface tests being executed. I developed CombinedGraphIndex, and have sent the low level Index stuff up for review, and followed that up by implementing a KnitIndex implementation that layers on the GraphIndex API to allow the use of .knit files without .kndx. This new index API allows us to switch out the indexing code and format much more easily than previously - the knit index logic is decoupled from the generic indexing facilities. It also allows us to record arbitrary-parent compression chains. Finally Jelmer implemented the wishlist item for bzr-gtk of a notification area widget that can be used to enable bzr-avahi watching, and share commits with the local LAN/get notified of commits on the local LAN.

Today is the last day of the sprint, and I hope that we'll get the nested tree branch passing all tests, the limit_to_revisions parameter causing delta-compressed knits to only copy data outside the listed parameters to the first full text, and have an experimental repository format that uses the new index layer for at least a couple of its knits (e.g. revisions and inventory, or revisions and signatures).

Ran into an interesting thing a couple of days ago. It looks like mercurial suffers from a race condition wherein a size-preserving change made within the same second as the last hg command's completion (or possibly the last time hg observed the file) will be ignored by hg.

This isn't particularly bad for regular human users (as long as you don't have your editor open while running commands in the background), but its pretty harsh for scripting users - size preserving changes are not *that* uncommon.

I'm immensely glad that we don't have this race within bzr (even on FAT filesystems where you only get last-modified to within 2 seconds!)

Just spent some time bringing bzr-avahi up to play nice with current bzr. This gives it integration with bzr-dbus (and thus bzr lan-notify) and bzr commit-notify (in the bzr-gtk plugin)

I want to know when we will get interesting talks like this happening! Competing on the basis of speed.

69 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!