Older blog entries for joey (starting at number 471)

two random thoughts about bugs

First thought is this: A bug's likelyhood of ever being fixed decays with time, starting when I first read it. If I have to read it a second time, the bug has already become more complex, since something prevented me from just fixing it the first time. If more information has to be added to the bug, that makes it yet more complex. If there is an argument in the bug about whether it is a bug, or how to fix it, just revisiting the bug at a later date can become more expensive than it's worth. Much of what is involved in filing good and effective bug reports are obvious corollaries of this. It also follows that it's best to either fix, or at least plan how to fix a bug immediatly upon reading it.

Second thought is about "wontfix". A bug submitter and the developer responsible for the bug see this state in very different ways, but the name hides what it really means, which is that there is a meta-bug affecting either the bug submitter, the developer, or both. Once you realize this, wontfix bugs, from either side, become a bit personally insulting. They also quickly decay to uselessness (see first thought), and then just lurk there wasting the developer's time in various ways. Bug tracking systems should not provide a "wontfix" state; if they want to track meta-bugs they should provide a way to reassign such a bug to some other party who can actually resolve such a meta-bug.

Syndicated 2011-10-29 18:08:33 from see shy jo


I attended the Git Together earlier this week. I was tenative about this, since I'm not really much of a git developer; all my git work is building stuff on top of it. It turned out great though.

At first it seemed like one of those parties where you don't know anyone. But then I got to reconnect with Avery Pennarun for the first time since DebConf 2, and got to know Jonathan Nieder better, and it was also nice to see Jelmer Vernooij. And the core developers were also very welcoming. Junio Hamano knew of my work (and I am in awe of his), and Jeff King thinks my take on SHA1 security issues has value, and has been expanding on it. Shawn Pearce managed the unconference subtly and well. Lots of very smart people. At one point I found myself accross the table from Android's lead developer.

I was very happy that everything I think needs improvement in git was discussed during the unconference:

  • big files: My postit suggesting this got more checks than most anything else, and I briefly presented git-annex at the start of a session on general scalability -- on its 1-year anniversary. Some ideas for improved hooks that git-annex and other tools could use are developing. Better scalability to lots of files and more efficient index files were also discussed.
  • git as a filesystem: There was a consensus that gone are the days when git was just about managing source code. (I remember being told on #git before I wrote etckeeper, that no, git should not be used for that..)
  • submodules: I was astounded that they're now considering supporting "floating" submodules, which would track the head of a branch, rather that the specific rev committed in the superproject. Many other problems that have kept me from ever trying submodules are also being worked on. This seems unlikely to replace mr, but who knows -- at least getting rid of repo is a goal.
  • SHA1 security was discussed for quite a long while, long enough that I felt a bit guilty for bringing it up, but it was an interesting and fruitful discussion. I went in thinking that the checksum basically has to be parameterized, but they have some good reasons not to do that, and some other good ideas, although what to do and when best to do it is still open for discussion. Signed commits are certianly coming soon. Also this amazing patch was developed.
  • Metadata storage was briefly discussed, but nobody seemed sure how to deal with it. Ideas floated included a metastore like tool that uses mergeable files, or storing metadata in some sort of notes-like separate branch.

Syndicated 2011-10-29 00:56:26 from see shy jo

california postcard

Visiting California this week and having a great time. Experienced my first earthquake; visited the Noisebridge hackspace with Seth and Mako; and yesterday went up to Point Reyes and flew a kite from cliffs over Drake's Bay.

Up there even the cows have a view.

Tomorrow, off to Google for the GitTogether.

Syndicated 2011-10-23 19:57:47 from see shy jo

borrowed dogs

I've never had a dog of my own since I grew up, but there have always been dogs in my life.

Calypso and Percy

Calypso was dropped off at Wortroot soon after I moved in, and was my borrowed dog for years. And she remained out there, spending a good decade with run of the woods, fields and streams, a good doggy life. She got old and feeble, spent winters by the fireplace, and finally it was too much for her. I'll miss her, the best dog I've known.

Recently I've caught glimpses of a dog lurking in the distance here at the Hollow. When I noticed it was sleeping on the roof of the battery box, I realized it was probably one of the dogs that used to live here but were given away last year. Exchanged email with the likely owners, now in Sudan, and they tell me her name is Domino, and she must have run away home.

So I've been putting out food for Domino this week, and yesterday she came close enough to be petted. Medium sized and white, her name is for a black mask extending from eyes to ears. Although currently skittish, she seems basically a good, calm dog.

Syndicated 2011-09-28 22:37:57 from see shy jo

happy haskell hacker

There are certian things haskell is very good at, and I had the pleasure of using it for one such thing yesterday. I wanted to support expressions like find(1) does, in git-annex. Something like:

  git-annex drop --not --exclude '*.mp3' --and \
    -\( --in usbdrive --or --in archive -\) --and \
    --not --copies 3

So, parens and booleans and some kind of domain-specific operations. It's easy to build a data structure in haskell that can contain this sort of expression.

{- A Token can either be a single word, or an Operation of an arbitrary type. -}
data Token op = Token String | Operation op
        deriving (Show, Eq)

data Matcher op = Any
        | And (Matcher op) (Matcher op)
        | Or (Matcher op) (Matcher op)
        | Not (Matcher op)
        | Op op
        deriving (Show, Eq)

(The op could just be a String, but is parameterized for reasons we'll see later.)

As command-line options, the expression is already tokenised, so all I needed to do to parse it is consume a list of the Tokens. The only mildly tricky thing is handling the parens right -- I chose to not make it worry if there were too many, or too few closing parens.

generate :: [Token op] -> Matcher op
generate ts = generate' Any ts
generate' :: Matcher op -> [Token op] -> Matcher op
generate' m [] = m
generate' m ts = uncurry generate' $ consume m ts

consume :: Matcher op -> [Token op] -> (Matcher op, [Token op])
consume m [] = (m, [])
consume m ((Operation o):ts) = (m `And` Op o, ts)
consume m ((Token t):ts)
        | t == "and" = cont $ m `And` next
        | t == "or" = cont $ m `Or` next
        | t == "not" = cont $ m `And` (Not next)
        | t == "(" = let (n, r) = consume next rest in (m `And` n, r)
        | t == ")" = (m, ts)
        | otherwise = error $ "unknown token " ++ t
                (next, rest) = consume Any ts
                cont v = (v, rest)

Once a Matcher is built, it can be used to check if things match the expression the user supplied. This next bit of code almost writes itself.

{- Checks if a Matcher matches, using a supplied function to check
 - the value of Operations. -}
match :: (op -> v -> Bool) -> Matcher op -> v -> Bool
match a m v = go m
                go Any = True
                go (And m1 m2) = go m1 && go m2
                go (Or m1 m2) = go m1 || go m2
                go (Not m1) = not (go m1)
                go (Op o) = a o v

And that's it! This is all nearly completly generic and could be used for a great many things that need support for this sort of expression, as long as they can be checked in pure code.

A trivial example:

  *Utility.Matcher> let m = generate [Operation True, Token "and", Token "(", Operation False, Token "or", Token, "not", Operation False, Token ")"]
*Utility.Matcher> match (const . id) m undefined 

For my case though, I needed to run some IO actions to check if expressions about files were true. This is where I was very pleased to see a monadic version of match could easily be built.

{- Runs a monadic Matcher, where Operations are actions in the monad. -}
matchM :: Monad m => Matcher (v -> m Bool) -> v -> m Bool
matchM m v = go m
                go Any = return True
                go (And m1 m2) = liftM2 (&&) (go m1) (go m2)
                go (Or m1 m2) =  liftM2 (||) (go m1) (go m2)
                go (Not m1) = liftM not (go m1)
                go (Op o) = o v

With this and about 100 lines of code to implement specific tests like --copies and --in, git-annex now supports the example at the top.

Just for comparison, find(1) has thousands of lines of C code to build a similar parse tree from the command line parameters and run it. Although I was surprised to see that it optimises expressions by eg, reordering cheaper tests first.

The last time I wrote this kind of thing was in perl, and there the natural way was to carefully translate the expression into perl code, which was then evaled. Meaning the code was susceptible to security holes.

Anyway, this is nothing that has not been done a hundred times in haskell before, but it's very nice that it makes it so clean, easy, and generic.

Syndicated 2011-09-19 23:44:31 from see shy jo

destructive impulse

I used to know guys who would take old computer hardware out in the desert and shoot it up. I never went, and I've never gratuitously destroyed a computer in 2 decades of working with the things. Until yesterday when I picked one up by the monitor and introduced it to the floor.

My regret isn't that I violently destroyed a computer, but that it was my Mom's computer, and I did it right in front of her, and without even a story-worthy reason, just out of garden variety frustration. And perhaps there's some regret that I was actually unsucessful in destroying anything other than the monitor and some speakers. The computer was salvaged, and my mom has a new monitor and a working system again.

Sorry Mom & Maggie. To make up for it, I'll humiliate myself with this recording of "broke mom's computer blues", which will only be available for a very limited time: click to listen (warning: contains harmonica).

Anyway, don't worry; any remaining frustration will be taken out the usual way: Splitting firewood.

Syndicated 2011-09-17 20:31:39 from see shy jo

watch me program for half an hour

In this screencast, I implement a new feature in git-annex. I spend around 10 minutes writing haskell code, 10 minutes staring at type errors, and 10 minutes writing documentation. A normal coding session for me. I give a play-by-play, and some thoughts of what programming is like for me these days.

Not shown is the hour I spent the next day changing the "optimize" subcommand implemented here into "--auto" options that can be passed to git-annex's get and drop commands.

watched it all, liked it (0%)

watched some, boring (0%)

too long for me (0%)

too haskell for me (0%)

not interested (0%)

Total votes: 0

Syndicated 2011-09-15 18:37:23 from see shy jo


This was the second most rainy day of the year, so far. 28 hours of solid rain and counting. Good time to take stock of the water situation here at the cabin after a year's experience.

The springs were as seasonal as I'd feared. After running strongly all Spring, the main spring ebbed and died in the Summer months. By mid-July there was no remaining source of potable water. If I had not been away so much over the summer it would have been worse. As it was, I needed only twenty gallons of water hauled from neighbor Lee's well.

The large cistern was filled in a single night during the most rainy period this Spring, and provided wash water for two months of the summer, and that only lowered it by a quarter. So I could have been less sparing with it, and not relied so much on rain water catchment for supplimental wash water.

Today I replaced the pump, so I can use the pressure tank and have running water in the house. I could have done this earlier, but what can I say; I enjoy hauling buckets of water, when it doesn't involve breaking ice. In the winter when there's firewood to haul instead, I'll enjoy the indoor plumbing.

Also today, I may have finally fixed the pipe to the large cistern properly, so it won't only fill at the heaviest rain times. Along with the ability to cross-pump water from the small to the large cistern, it should be easier to keep it filled, and I might be able to reduce the turbidity enough that it's potable eventually. Will see.

I don't live in a dry region, but this is a relatively dry place for the area. It's been interesting to get a taste of what it'd be like to live somewhere where water is actually scarce.

Syndicated 2011-09-06 01:41:46 from see shy jo

size of the git sha1 collision attack surface

The kernel.org compromise has people talking about the security of git's use of sha1. Talking about this is a good thing, I think, but there's a lot of smug "we're cryptographically secure" in the air that does not seem warranted coming from non-cryptographers like me.

Two years ago I had a discussion on my blog about git and sha1, that reached similar conclusions to what I'm seeing here: It seems that current known sha1 attacks require somehow getting an ugly colliding binary file accepted into the repository in the first place. Hard to manage for peer reviewed source code. We all hate firmware in the kernel, so perhaps this is another reason it's bad. ;-) Etc.

Well, not so fast. Git's exposure to sha1 collisions is broader than just files. Git also stores data for commits, and directory trees.

Git's tree objects are interesting because they're a bag of bytes that is rarely if ever manually examined. If there was a way to exploit git such that it ignored some trailing garbage at the end of a tree object, then here's an attack injection vector that would be unlikely to be caught by peer review.

If you can change the content of a tree without changing its sha1, you can simply make it link to an older version of a file that had an exploitable problem. Or you can assemble a combination of files that results in an new exploitable problem. (For example, suppose a buffer size was hardcoded in two files in the kernel, and then the size was changed in both -- make a tree that contains one change and not the other.)

Now, git's tree-walk code, until 2008, mishandled malformed data by accessing memory outside the tree buffer. Was this an executable bug in git? I don't know. It is interesting that the fix, in 64cc1c0909949fa2866ad71ad2d1ab7ccaa673d9 relied on the parser stopping at a NULL -- great if you want to put some garbage after the tree's filename. With that said, the particular exploit I describe above probably won't work -- I tried! Here's all the code that stands between us and this exploit:

        if (size < 24 || buf[size - 21])
                die("corrupt tree file");

        path = get_mode(buf, &mode);
        if (!path || !*path)
                die("corrupt tree file");

Any good C programmer would recognise that this magic-constant-laden code needs to be careful about the size of the buffer. It's not as clear though, that it needs to be careful about consuming the entire contents of the buffer. And C programmers involved with git have gotten this code wrong before.

tldr: If git is a castle, it was built just after cannons were invented, and we've had our fingers in our ears for several years as their power improved. Now the outer wall of sha1 is looking increasingly like one of straw, and we're down to a rather thin inner wall of C code.

Syndicated 2011-09-02 05:47:39 from see shy jo

summer trips wrapup

Finally back from a solid month away.

  • Drive from England to Bosnia, and back. Plus two days of air travel, for a week of travel all told.

    The trip with the UK convoy back from Bosnia was enjoyable, Steve found a great route thru the Alps, and I much enjoyed finally seeing them. Then we stopped at a golf course in Luxemburg, where my hotel room was a suite ... swanky. We bogged down in Belgium, missed our ferry, which provided a chance to play some Eurogames in Europe. Then I visited family in London.

    I hope to eventually have some pictures from that trip. If those who had cameras make them available.

  • In between the European tour, there was DebConf. As always, it was excellent. I did not come out with the large todo list of exciting things like happened last year. I did continue nibbling through that list. Had some good conversations about haskell, met Intrigeri, who wrote the ikiwiki po plugin. Had some meetings on things I feel I've sorta moved on from to some extent but still have to be available for. Didn't manage significant technical work, but this was not unexpected. The day trip was fun, enjoyed seeing the waterfalls and little mills, and swimming the cold, cold river. The last few days I was out of energy. I did not give any presentations, and only realized during the lightning talks that I should have given one about git-annex.

  • I had expected to have most of a week at home after getting back from Europe, and technically did. But it was too annoying and unusual to count. Wildlife ate two trees of pears while I was away. My cat was stressed. I was stressed. It was insanely humid, and the house had been closed up for two weeks, and I had to fight mold and damp.

  • So the added trip to the beach that put this month over the top to beyond insane amounts of travel, turned out to be sorta a good thing. Camping in the dunes, kids, good books, sea turtle eggs fenced off a hundred feet away on the beach waiting to hatch. Lots of kite flying, and somehow no sunburn. And no rain until a dawn rainbow followed by lots of wet just as we were breaking camp.

    Full details in this Ocracode. Nobody but me understands or cares, but that's just fine. :)

    OBX1.1 P6 L6 SC5d+++b--c- U0 T3 f++-b2 R1dw Bn-b++m++ F+u+ SC++s++g0 H+++f2i4Vs---m0 E+++r+++ T6f++-b0 R1w Bn-b++m++ F++u++ SC++s++g1 H+++f2i5 V+++ E+++r++

For the past three days I've been coding, which feels good after all that time away.

Syndicated 2011-08-17 19:44:06 from see shy jo

462 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!