Older blog entries for mbp (starting at number 215)

virgin control, contd

I rediscovered some interesting slides about Microsoft's internal development processes, and their internal-use-only version control system, SourceDepot.

click here to begin

"We have one version control system we sell (Visual SourceSafe, which sucks), and one we use (SourceDepot). If our customers don't like it, we tell them to see Figure One."

6 Jul 2002 (updated 6 Jul 2002 at 14:26 UTC) »
dog food

auspex, the term "eat dog food" is not scatalogical, or at least not originally. In fact, a recent Economist article discussed the apparent origin of the term:

``SHE got me to buy Uncle Ben's rice,'' said Colin Powell, America's secretary of state, early last year as he defended his appointment of Charlotte Beers as chief spin-doctor. The 66-year-old Texan steel magnolia dresses her poodle in sweaters, flirts with company bosses and is lauded as the most powerful woman in advertising. ``There is nothing wrong with getting somebody who knows how to sell something,'' Mr Powell added. ``We are selling a product. We need someone who can rebrand American foreign policy, rebrand diplomacy.''

Charismatic and striking, Ms Beers certainly knows how to sell things. In a 40-year career, including stints running two top advertising agencies, Ogilvy & Mather and J. Walter Thompson, she conquered Madison Avenue with a mix of southern charm and sheer audacity. She ate dog food to woo product men at Mars; she wowed managers at Sears by casually dismantling and reassembling a power drill during her pitch.

...

With a bit of luck and a lot of money and persistence, Ms Beers may, in time, convince some non-Americans (and possibly even some Muslims) of her cause. But she is unlikely to be able to polish up America's image to the extent that both she and her boss would like. The days when Ms Beers's problems could be solved by eating a little dog food must seem increasingly appealing.

The term's grown into the general analogy that it certainly helps sell something if you use it yourself. (Conversely, it looks rather silly for Microsoft to run Unix-bashing web sites on Unix.)

It also grew the additional meaning that if you want to make really good dog food, you ought to eat it yourself and see what it tastes like, or at least feed it to your own dog. I think this is one of the things that open source programmers tend to get right more often than not: Subversion store their code in Subversion; I upload rsync releases using rsync. One of the main print servers in the HP Roseville lab runs our Linux/Samba package, and is called ALPO to stress the point.

I guess it can be harder for commercial projects to do this, because they tend to be building things that the programmers cannot necessarily "use" by themselves -- people don't use ERP systems for fun. But you can still get close to it by doing frequent integrations and builds.

Things open source tends to get right in this area:

  1. It's easy to build: almost always, just ./configure && make; failing that read INSTALL. I can't remember ever seeing a proprietary project that you could build the first time without assistance or instruction from somebody else.
  2. There is a common "body of knowledge" about how projects ought to behave with respect to build and packaging (have a README, INSTALL, use autoconf, have "make clean", ...) Even within single companies I don't think things are standardized to nearly the same extent.
  3. In most cases, every developer or tester can make a fresh build whenever they want, without having to wait for the "build group" or "CM group" to do it. So people do build all the time.
  4. CVS, bless its black heart, is at least simple enough that people can easily see what changed recently, etc. Developers are empowered to participate in CM.
  5. ...

Things we could do better:

  1. Write systematic test suites, using PyUnit or something similar. Some projects (gcc, ..) are great; some people don't care at all.
  2. Have better traceability from change request/bug report through to code. (Debian listing bug #s in their ChangeLog is good, but it's patchy.)
  3. Keep control over all the dependencies of the source. Moving a large project from RH6.2 to RH7.2 can produce many random failures.
  4. Keep centralized logs of failures and successes. One CM paper I was reading recently pointed out that it would be nice to know about all the *successful* builds that have happened recently, so you can work backwards in the case of a mysterious failure.
  5. ....

Of course, dog food that tastes good to humans may not taste good to dogs; programs that taste good to geeks may not be so nice for other species.

automake

I'm certainly guilty of complaining about auto*; on the other hand they're by far the best solution out there.

It seems to me that the problem domain is doubly hard because the tests are complex, and you need to translate into shell script / Make that will run on all machines. The world would be so much simpler if we could count on GNU Make everywhere.

If I was going to write one of these, I would make a Python tool that produced a shell script. Python is easy to learn and maintain; sufficiently conservative sh will run everywhere. As for auto*, only developers need to install Python and the code generator.

I feel down in a way perhaps similar to tromey when people ask me about rsync limitations that are historical but hard to fix.

Following a link, I stopped by the ANU library this afternoon to look at the IEE Proceedings - Software special issue on open source. (It's only available on paper, not on the web... how quaint :-)

The Asklund and Bendix paper I was originally looking for was pretty interesting: they have some insightful things to say about the way configuration management is done in open source, as compared to conventional development. For example, your father's CM textbook probably shows proposed changes going to a Change Control Board, and if they are approved they will be implemented, integrated, and QA'd. The open source way is generally to implement first, and then approve or reject. There are a few interesting observations along these lines, based on interviews with people from Mozilla, KDE, and the Linux kernel. Perhaps nothing earth-shattering, but interesting nontheless.

Some of the other papers were really deeply disappointing though. I'm not talking about incorrect technical details about Linux -- that would be quite forgiveable -- but gaping assumptions that ought to be obvious to anyone with some kind of scientific background. Off the top of my head I could name a handful of counter-theories that would equally well explain some of the results (either pro- or anti-open source.)

I hesitate to go into details because I don't have time to go over all of them carefully, but after all this is just a diary so I'll go ahead: the "Trust and Vulnerability" paper is desparately in need of the thoughtful statistics-based assessment of program reliability that informs, for example Ross Anderson's recent paper. It's completely missing; they analyze a single variable when there are obviously many more and the result is completely unconvincing. The thing that makes the security-vs-obscurity question essentially hard is that you need a complex model of the various communities; they missed the point as far as I can see. The overall result is so poor as to be not even worth criticizing.

I'm don't think I've read that journal before, so I'm not sure how this compares to their usual standard. It does really seem like a shame, because academic SE can be very worthwhile, but at least in this instance it seems disconnected from open source. The first derivative is good: people are seeing open source as being serious, as having something to teach the rest of the world. But it still requires more work on both sides to build a good understanding.

virgin control

I spent most of today trying to wrap my head around ClearCase, which is the configuration management system for my work project. I'm going to try to avoid the standard temptation to bitch about expensive, proprietary, bloated software -- I'm enough of a SCM geek that I found it pretty interesting, even if there are some bad features.

A wise person (Joy?) said that in a good sofware system,

simple things should be simple, and complex things should be possible

As far as I can see, ClearCase fails on the first of them, and CVS fails on the second.

ClearCase is really quite different to most configuration management systems in two ways: rather than a version control system per se, it's a general database that can be programmed to do SCM; and secondly, it's normally used by kernel hooks that allow transparent filesystem operations.

So, for example, you can have a directory that always contains the most recent buildable version of the software; you can do hardlink-like tricks to combine different modules, and so on.

The down side is that this complexity generally seems to result in every team sacrificing one full-time programmer just to the care and feeding of ClearCase, and to writing Perl or sh wrappers to protect other developers from the rotating blades.

Putting version control into the kernel is one of those ideas that sounds immensely cool at first, but that anybody who's thought about it ought to realize is much more trouble than it's worth. Amongst other things:

  • Most of the time, you don't *want* files appearing and disappearing underneath you like silly putty!
  • Purely local (building, temporary saves, etc) greatly outnumber operations that need to interact with the version control system, so it's silly to pay a performance price for all of them.
  • Putting it in the kernel means there's "no escape" if ClearCase crashes or has a bug, or if somebody's configured it wrongly.

For some reason, the people doing the Linux implementation weren't satisfied with just doing a pluggable filesystem, but instead they hooked the syscalls directly. So, a bug we're experiencing prevents you from unlinking files in /tmp, even though of course ClearCase shouldn't have anything to do with it.

Leaving aside the crack-inspired kernel hooks, I guess the general idea of providing "mechanism not policy" for VC is pretty good, at least if the development organization is big enough to care about inventing and implementing their own rules.

One person thought that teams controlled by QA or methodology people like ClearCase, and teams controlled by programmers abhor it. I don't think this is entirely because programmers are cowboys/girls and dislike SCM, but rather that they're prepared to go along with the SCM tool's idea of process for the sake of getting work done.

The overall experience of reading manuals all day was rather like sitting in a university library trying to concentrate on reading textbooks. Very soporific.

Along the way I (re)discovered a few interesting papers on SCM. They're only short, and I think they're really highly worthwhile:

This evening I updated the distcc web site, partly based on the very pretty GNUpdate web site. HTML is kind of fiddly, but I'm happy with how this turned out.

bytesplit writes

Remember, flamacious comments deserve no apology.

That's bullshit. If you say something ignorant and rude, then you ought to apologize. Didn't your parents teach you that? You should, both because it's the decent thing to, and because it'll make it more likely that people will help you in the future. This isn't slashdot and it isn't AOL; a higher standard of behaviour is expected.

I'm glad you're enjoying advogato and Debian. If you can just play nicely with other people, then I'm sure you'll have a great time. That implies just being a bit polite, and admitting when you screwed up.

On-topic

So, some people seem to find exceptions a bit wierd when they first come to Java or Python from C or some other language. I think the first time I saw them was in C++, and I found them a bit wierd.

I suspect a good way to explain them is by showing how you can get step-by-step from good C error-handling patterns to exceptions.

So for example the kernel is full of code on this pattern

{
        int error;

.... error = -ENOMEM; s = alloc_super(); if (!s) { blkdev_put(bdev, BDEV_FS); goto out; } ...

out: path_release(&nd); return ERR_PTR(error); }

In other words, if something goes wrong, you want to skip the rest of the function, do some cleanup, and propagate the error upwards. That maps pretty straightforwardly to

try:
  s = alloc_super()
  if not s:
    raise 'not enough memory'
  return s
finally:
  path_release(nd)

Similarly for except statements and so on. Hopefully a worthwhile howto or something will come out of it, and people will feel more happy using exceptions for good, not for evil.

Off-topic

I know I probably shouldn't write about this, but I'm only human...

I guess bytesplit is probably a fine person in many ways, and he now seems to be using the site more or less as designed, but his arrival and initial posts were pretty annoying to me.

It feels like a lot of people, including me, are giving up their time to try to write good software for the world, and it's annoying when somebody just seems to want to piss all over it. I really would just quit, except that I care much more about the opinions of the genuinely friendly and bright people involved in it.

At some level I know it's a big bad world, and that will always happen, but it's still not nice to see this happen. I don't think that post, or this one is called for; it wouldn't be polite in person and it ought not to be acceptable here. It still seems to me to be counterproductive for bytesplit to want to be part of free software but also to insult people he could learn from.

The guy is obviously, admittedly, new to Linux: doesn't know C, doesn't(?) have a Unix box yet, etc. Nothing wrong with that; certainly not a sign of moral failure or lack of intelligence; everybody was there once. However, I do think it's reasonable for a beginner to show a little amount of respect.

I can admit that my previous post was a bit flamacious. It would be reasonable, and kind of nice, for him to apologize, but if he can just be civil in future I guess that would be good enough.

I've tried to make this entry as moderate as I can. I half expect that he will just flame me in a similarly illiterate and ridiculous vein, but we shall see. Honestly and objectively, if he thinks I'm being unreasonable, then I just can't imagine him ever working constructively on an open source project.

mbp, write out 100 times: "I will not feed the trolls."

commentary

bytesplit calling Raph "incompetent", and Raph's measured response is amongst other things, an interesting example of how non-computer social trust metrics. Raph is so obviously both extremely technically competent and a gentleman, and bytesplit obviously quite emotionally conflicted and new to the non-M$ world.

I was going to write to bytesplit and suggest that esr's How to become a hacker document is a good place to start, but I'm not at all sure that his help would be worth having.

I'm kind of puzzled by why bytesplit wants to work on free Unix if he apparently doesn't get on with any (?) of the people doing it, or like the product himself.

To be generous, I guess if he's a novice he might not realize that saying "computers ought to be easier to use" is somewhat easier than actually achieving it. More concrete suggestions about improving particular features or designs would be better. For example, if you don't like a chapter of documentation, send in a replacement!

Perhaps he should probably start his own alternative OS project, announce it on Freshmeat and Slashdot, and go off and code on it in isolation for a couple of years. I'm sure afterwards everybody will humbly admit that he was right all along.

Amongst several interesting points in that document:

Finally, a few things not to do.

  • Don't use a silly, grandiose user ID or screen name.
  • Don't get in flame wars on Usenet (or anywhere else).
  • [...]

The only reputation you'll make doing any of these things is as a twit. Hackers have long memories -- it could take you years to live your early blunders down enough to be accepted.

The problem with screen names or handles deserves some amplification. Concealing your identity behind a handle is a juvenile and silly behavior characteristic of crackers, warez d00dz, and other lower life forms. Hackers don't do this; they're proud of what they do and want it associated with their real names. So if you have a handle, drop it. In the hacker culture it will only mark you as a loser.

Heh.

python rocks

I have been trying to help some of the guys at work be more comfortable in Python, by using a more natural idiom rather than writing Java-in-Python. I think I helped. I'm very concerned to do it in a way that's respectful of their existing expertise and hard work. I think it's quite possible to help people change without making them defensive, if you put a bit of work into doing it the right way.

Re ISS/Apache kerfuffle: it occurred to me while reading one of the articles that it's quite possible to be a karma whore for the mass media: just say something vapid but quotable that goes along with the journalist's prejudice, and you get some nice PR.

Subversion is looking very cool indeed: nice design and implementation. I think they also did a good job of being ambitious enough to be a big improvement over CVS, but modest enough to finish in a reasonable time and be understandable. I'm looking forward to their getting into beta. setuid, you should like it: it fixes the file and directory renaming limitations that bugged you so much about CVS.

I don't know... I think this banner ad is a little ambiguous. I think I can see the hint of a smile on her. :)

Oh, I went to COPIA last weekend. Nice.

206 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!