Older blog entries for mbp (starting at number 220)

distcc is coming along really well: across three machines it compiles ~2.7 times as fast. So (abusing maths slightly) this is like having a 4.6GHz machine, or jumping about 24 months along Moore's law. If you have to compile anything large (Mozilla? GNOME? Linux?) you should check it out.

In non-geeky news, the Blue Penguins are through into the semi-final of our netball competition. (Well, ok, the name is maybe a bit geeky.) We'll probably get slaughtered, but it's still a big improvement from last year.

I think I indirectly found a kernel bug in 2.2 about FIN_WAIT1 handling with TCP_CORK.

I am rewriting some inner-loop Python code into C to make it faster. The Python C API is really nice; similar to JNI and much easier than Perl.

Some time ago Zaitcev asked how I would map from this C code to exceptions.

int foo() {
    if ((x = bar()) == NULL) goto error_bar;
    if ((y = baz()) == NULL) goto error_baz;

// undo_baz(); error_baz: undo_bar(); error_bar: undo_something(); return -1; }

I think this is an excellent example of something better handled using exceptions.

The intention of the code, assuming I'm reading it the way Zaitcev intended, is that we need to either have all operations succeed, or have them all rolled back. I think you can understand exceptions much better if you think about intentions.

That implies that on the failure case, we want to have all the appropriate undo operations called.

I'm going to assume that bar() and baz() raise exceptions rather than returning NULL to indicate errors, because that's generally a better pattern in a language with good exceptions. (C++ is not necessarily included in that class.) You could do it the other way.

One way you might write it is like this:

int foo() {
    Object x = null, y = null;

do_something(); try { x = bar(); y = baz(); } finally { /* if we didn't complete, roll back */ if (y == null) undo_baz(); if (x == null) undo_bar(); } }

I think this fits pretty well with the intended of finally being to do clean up work. (Conversely, catch is for handling errors.) But you could write it either way.

Doing it this way has the advantage that the normal case where no errors occurs appears as straight-line code without being cluttered by error handling. Also, the error is always propagated back correctly, without needing a errno temporary variable to hold it, assuming you want more information than just "-1".

My main point is not that one is better than the other, but rather that they are isomorphic. If you are comfortable with good error-handling idioms in C, then you can quickly learn to use exceptions well.

Zaitcev's friend who said that finally almost always indicates a bug is probably correct; certainly I have found that to be true in some code. However, I think that's not because the language is poorly designed but rather because it's poorly understood. That's what I was hoping to eventually help.

14 Aug 2002 (updated 14 Aug 2002 at 12:35 UTC) »
All semaphores will be lowered to half mast in memory of Edsger Dijkstra

don marti links to wise advice on dealing with Chinese spam.

Busy writing an essay on the DMCA. (Isn't everyone? Yes, but perhaps this will reach a new audience.)

Cold and rainy in Canberra.

One of the nicest things but least obvious things about Python is that the C interface is really straightforward and easy to use. At work I'm moving a frequently-called Python function into C, and it's pretty easy, though it does turn out to be a bit more verbose than a plain C implementation.

6 Aug 2002 (updated 6 Aug 2002 at 16:16 UTC) »

I suspect you're bored of me whining about Clearcase, so I'll defer that at least for a couple of days. I do however have something potentially interesting to say on the topic of "a poor workman blames his tools."

joelspolsky recently linked to an entertaining transcript of bruce schneier talking about open source. A few comments:

* slashdot's group mind would get enormously fixated on joel being a fag. oh wow. nothing like slashdot to reacquaint your with the bottom end of the bell curve.

* something about this reminds me of watching performers drinking on stage. there's something about it that reminds me of that arrogant, charming, dissociative, creative frame of mind that a few drinks bring on to a bright person.

* quoting out of context a few paragraphs from a drunk (or conference-speaking) person is in slightly bad taste. it reminds me of watching the Whitlams and worrying that Tim would spill his fifth glass of shiraz on the keyboard. (first child, too responsible, i know.)

* a good conference dinner speech ought to contain roughly equal amounts of flattery, humour, teasing, truth, exageration, rowdiness, ... logic is not particularly needed because you hope your audience will be as deep in their cups as the speaker. i am reminded of the poor bunny at Privacy By Design 2000 who was upstaged by (literally) the Miss Canada contestants filing through the room.

* certainly, in the end free software is futile, lame, derivative, whatever. everything is futile, at least from a certain point of view. everything you think in your life has been thought before (and more clearly, expressed more beautifully); everything you achieve has been done better. the exceptions, if they ever occur, are tainted by the compromises necessary to achieve them. the woman who burns your soul is nothing is nothing compared to Helen of Troy. in the end, that point of view, though objectively true, does not get you anywhere, and there is no point dwelling on it.

* flattering open source people by creating neat Philip Jose Farmer-esque images will get you everywhere

* that certainly applies to you too, Sterling: Gabriel said most of this, and better, before, in his essay on the "slightly bitter quality" of open source. and Gabriel was a pale imitation of Alexander.

* "terrorspace": I hesistated to tie my shoe while walking through SFO customs, and attracted the attention of the security system. i have never before felt more at risk of being shot.


And to bjf: relax, and enjoy it. As an unidentified sex therapist said, "if it doesn't feel good, you're not doing it right.". If you think hacking on some free project will help you rediscover why you thought computers were a good idea in the first place then go and do that as a matter of urgency. If going and riding a bike or reading a book or something completely nondigital seems like a good idea then leave strictly at five and do that instead.

One thing I've certainly noticed in my career to date is that the correct answer varies enormously from week to week. Sometimes I just want to hack until 3am. Sometimes I can't stand seeing a computer.

As Richie said, just try to be lucky and everything will work out fine.

I went to the australian open source symposium in Sydney on the weekend. That was fun. More notes later; my slides are here.

vicious hairy mary, contd

(The following rant is sold by weight, not by volume. Don't take it too seriously.)

ClearCase's own diff tool gives you not one but two sucky diff formats: plain old Unix diffs (with no context), and a side-by-side diff that gives you a vague idea but truncates at about 40 columns, and can't be fed to patch or emacs.

You might think the obvious default for "diff" in a version control system is "what did I change?", or perhaps "what's different in the relevant branch tips", but oh no, that would be too simple:

  % ct diff Makefile
  cleartool: Error: At least two objects must be supplied to compare.

One of the potentially more pleasant hallucinations caused by ClearCase inhalation is that you can transparently see old versions of a file by using magic filenames.

So I thought I'd try using gnu diff to get the equivalent of "cvs diff -u Makefile".

One of several problems with doing versioning through the kernel is that the error-reporting interface is insufficient: clearcase's only means of reporting things to the application is through errno. So if you make a mistake:

  % diff -u  Makefile@@/main/clearlake/barracuda Makefile
  diff: Makefile@@/main/clearlake/barracuda: Input/output error

% diff -u Makefile@@pred Makefile diff: Makefile@@pred: No such file or directory

% diff -u Makefile@@/main Makefile diff: Makefile@@/main: Input/output error

Lovely. This is right up there with ed printing '?' in case of an error. Obviously the user knows what they did wrong, and they were just being intentionally naughty by using the wrong syntax. :-/

Those examples, by the way, are taken directly from the manual, as far as I can see.

Anybody who stumped up USD3000 for the full ClearCase S&M experience would presumably feel ripped off if they weren't limping the next morning.

virgin control, contd

I rediscovered some interesting slides about Microsoft's internal development processes, and their internal-use-only version control system, SourceDepot.

click here to begin

"We have one version control system we sell (Visual SourceSafe, which sucks), and one we use (SourceDepot). If our customers don't like it, we tell them to see Figure One."

6 Jul 2002 (updated 6 Jul 2002 at 14:26 UTC) »
dog food

auspex, the term "eat dog food" is not scatalogical, or at least not originally. In fact, a recent Economist article discussed the apparent origin of the term:

``SHE got me to buy Uncle Ben's rice,'' said Colin Powell, America's secretary of state, early last year as he defended his appointment of Charlotte Beers as chief spin-doctor. The 66-year-old Texan steel magnolia dresses her poodle in sweaters, flirts with company bosses and is lauded as the most powerful woman in advertising. ``There is nothing wrong with getting somebody who knows how to sell something,'' Mr Powell added. ``We are selling a product. We need someone who can rebrand American foreign policy, rebrand diplomacy.''

Charismatic and striking, Ms Beers certainly knows how to sell things. In a 40-year career, including stints running two top advertising agencies, Ogilvy & Mather and J. Walter Thompson, she conquered Madison Avenue with a mix of southern charm and sheer audacity. She ate dog food to woo product men at Mars; she wowed managers at Sears by casually dismantling and reassembling a power drill during her pitch.


With a bit of luck and a lot of money and persistence, Ms Beers may, in time, convince some non-Americans (and possibly even some Muslims) of her cause. But she is unlikely to be able to polish up America's image to the extent that both she and her boss would like. The days when Ms Beers's problems could be solved by eating a little dog food must seem increasingly appealing.

The term's grown into the general analogy that it certainly helps sell something if you use it yourself. (Conversely, it looks rather silly for Microsoft to run Unix-bashing web sites on Unix.)

It also grew the additional meaning that if you want to make really good dog food, you ought to eat it yourself and see what it tastes like, or at least feed it to your own dog. I think this is one of the things that open source programmers tend to get right more often than not: Subversion store their code in Subversion; I upload rsync releases using rsync. One of the main print servers in the HP Roseville lab runs our Linux/Samba package, and is called ALPO to stress the point.

I guess it can be harder for commercial projects to do this, because they tend to be building things that the programmers cannot necessarily "use" by themselves -- people don't use ERP systems for fun. But you can still get close to it by doing frequent integrations and builds.

Things open source tends to get right in this area:

  1. It's easy to build: almost always, just ./configure && make; failing that read INSTALL. I can't remember ever seeing a proprietary project that you could build the first time without assistance or instruction from somebody else.
  2. There is a common "body of knowledge" about how projects ought to behave with respect to build and packaging (have a README, INSTALL, use autoconf, have "make clean", ...) Even within single companies I don't think things are standardized to nearly the same extent.
  3. In most cases, every developer or tester can make a fresh build whenever they want, without having to wait for the "build group" or "CM group" to do it. So people do build all the time.
  4. CVS, bless its black heart, is at least simple enough that people can easily see what changed recently, etc. Developers are empowered to participate in CM.
  5. ...

Things we could do better:

  1. Write systematic test suites, using PyUnit or something similar. Some projects (gcc, ..) are great; some people don't care at all.
  2. Have better traceability from change request/bug report through to code. (Debian listing bug #s in their ChangeLog is good, but it's patchy.)
  3. Keep control over all the dependencies of the source. Moving a large project from RH6.2 to RH7.2 can produce many random failures.
  4. Keep centralized logs of failures and successes. One CM paper I was reading recently pointed out that it would be nice to know about all the *successful* builds that have happened recently, so you can work backwards in the case of a mysterious failure.
  5. ....

Of course, dog food that tastes good to humans may not taste good to dogs; programs that taste good to geeks may not be so nice for other species.


I'm certainly guilty of complaining about auto*; on the other hand they're by far the best solution out there.

It seems to me that the problem domain is doubly hard because the tests are complex, and you need to translate into shell script / Make that will run on all machines. The world would be so much simpler if we could count on GNU Make everywhere.

If I was going to write one of these, I would make a Python tool that produced a shell script. Python is easy to learn and maintain; sufficiently conservative sh will run everywhere. As for auto*, only developers need to install Python and the code generator.

I feel down in a way perhaps similar to tromey when people ask me about rsync limitations that are historical but hard to fix.

Following a link, I stopped by the ANU library this afternoon to look at the IEE Proceedings - Software special issue on open source. (It's only available on paper, not on the web... how quaint :-)

The Asklund and Bendix paper I was originally looking for was pretty interesting: they have some insightful things to say about the way configuration management is done in open source, as compared to conventional development. For example, your father's CM textbook probably shows proposed changes going to a Change Control Board, and if they are approved they will be implemented, integrated, and QA'd. The open source way is generally to implement first, and then approve or reject. There are a few interesting observations along these lines, based on interviews with people from Mozilla, KDE, and the Linux kernel. Perhaps nothing earth-shattering, but interesting nontheless.

Some of the other papers were really deeply disappointing though. I'm not talking about incorrect technical details about Linux -- that would be quite forgiveable -- but gaping assumptions that ought to be obvious to anyone with some kind of scientific background. Off the top of my head I could name a handful of counter-theories that would equally well explain some of the results (either pro- or anti-open source.)

I hesitate to go into details because I don't have time to go over all of them carefully, but after all this is just a diary so I'll go ahead: the "Trust and Vulnerability" paper is desparately in need of the thoughtful statistics-based assessment of program reliability that informs, for example Ross Anderson's recent paper. It's completely missing; they analyze a single variable when there are obviously many more and the result is completely unconvincing. The thing that makes the security-vs-obscurity question essentially hard is that you need a complex model of the various communities; they missed the point as far as I can see. The overall result is so poor as to be not even worth criticizing.

I'm don't think I've read that journal before, so I'm not sure how this compares to their usual standard. It does really seem like a shame, because academic SE can be very worthwhile, but at least in this instance it seems disconnected from open source. The first derivative is good: people are seeing open source as being serious, as having something to teach the rest of the world. But it still requires more work on both sides to build a good understanding.

virgin control

I spent most of today trying to wrap my head around ClearCase, which is the configuration management system for my work project. I'm going to try to avoid the standard temptation to bitch about expensive, proprietary, bloated software -- I'm enough of a SCM geek that I found it pretty interesting, even if there are some bad features.

A wise person (Joy?) said that in a good sofware system,

simple things should be simple, and complex things should be possible

As far as I can see, ClearCase fails on the first of them, and CVS fails on the second.

ClearCase is really quite different to most configuration management systems in two ways: rather than a version control system per se, it's a general database that can be programmed to do SCM; and secondly, it's normally used by kernel hooks that allow transparent filesystem operations.

So, for example, you can have a directory that always contains the most recent buildable version of the software; you can do hardlink-like tricks to combine different modules, and so on.

The down side is that this complexity generally seems to result in every team sacrificing one full-time programmer just to the care and feeding of ClearCase, and to writing Perl or sh wrappers to protect other developers from the rotating blades.

Putting version control into the kernel is one of those ideas that sounds immensely cool at first, but that anybody who's thought about it ought to realize is much more trouble than it's worth. Amongst other things:

  • Most of the time, you don't *want* files appearing and disappearing underneath you like silly putty!
  • Purely local (building, temporary saves, etc) greatly outnumber operations that need to interact with the version control system, so it's silly to pay a performance price for all of them.
  • Putting it in the kernel means there's "no escape" if ClearCase crashes or has a bug, or if somebody's configured it wrongly.

For some reason, the people doing the Linux implementation weren't satisfied with just doing a pluggable filesystem, but instead they hooked the syscalls directly. So, a bug we're experiencing prevents you from unlinking files in /tmp, even though of course ClearCase shouldn't have anything to do with it.

Leaving aside the crack-inspired kernel hooks, I guess the general idea of providing "mechanism not policy" for VC is pretty good, at least if the development organization is big enough to care about inventing and implementing their own rules.

One person thought that teams controlled by QA or methodology people like ClearCase, and teams controlled by programmers abhor it. I don't think this is entirely because programmers are cowboys/girls and dislike SCM, but rather that they're prepared to go along with the SCM tool's idea of process for the sake of getting work done.

The overall experience of reading manuals all day was rather like sitting in a university library trying to concentrate on reading textbooks. Very soporific.

Along the way I (re)discovered a few interesting papers on SCM. They're only short, and I think they're really highly worthwhile:

This evening I updated the distcc web site, partly based on the very pretty GNUpdate web site. HTML is kind of fiddly, but I'm happy with how this turned out.

bytesplit writes

Remember, flamacious comments deserve no apology.

That's bullshit. If you say something ignorant and rude, then you ought to apologize. Didn't your parents teach you that? You should, both because it's the decent thing to, and because it'll make it more likely that people will help you in the future. This isn't slashdot and it isn't AOL; a higher standard of behaviour is expected.

I'm glad you're enjoying advogato and Debian. If you can just play nicely with other people, then I'm sure you'll have a great time. That implies just being a bit polite, and admitting when you screwed up.

211 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!