Older blog entries for joey (starting at number 476)

a resolution that stuck

Last year, my new year's resolution was to write in my journal every day. That actually stuck, I wrote 262 journal entries in 2011. While I've been keeping a journal intermittently since 1998, last year I doubled the number of entries in it. And wrote a novel's worth of entries -- 53 thousand words!

Most of it is of course banal and mundane stuff. Not good compared with Lars, who does something with his journal where he goes into some detail about code he's working on, and other work. The excerpts I've seen are quite nice. But after I've written code, written a commit message, documentation, perhaps bug reports etc, I often can't find much to say about it in my journal, beyond the bare bones that I worked on $foo today or faced a particularly hard bug. I also worry that the journal, and my reluctance to repeat myself, often tips the balance away from me blogging, if I write down something in the journal first.

Here's my journal for today:

Compare what jokes are funny now with those in 1982. The 1982 ones from net.jokes on olduse.net seem juvenile. Now compare what Unix joke man pages are funny now with those I'm reading from 1982. They seem basically the same. What would Biella make of this?

Liw noticed ikiwiki OOM on pell. Tracked down to a perl markdown bug with long lines. Had quite enough of perl markdown; ikiwiki will be moving to a different engine. Added discount support to it today, still needs Debian package tho.


Really gorgeous sunset, with a high wind, moon, puffy low, fast moving clouds. Enjoyed it ecstaticly. It's going to get cold soon. Very rainy early, but then got intermittently sunny; power is holding out ok.

Was going to roast a chicken today, but got distracted and had a large lunch besides. Need to find some quick food for supper.

I need to start a new book, should it be the River Cottage book about meat that I stole from Anna, or some SF?

Blogged about journaling, and put this journal entry in it, so also journaled about blogging. Wrote it somewhat self-conciously.

The benefits for me have ranged from being able to go back and work out dates of events, to forwarding the odd excerpts to others. The best thing though is certianly having a regular time of introspection, to look back over my the day.

If you've not got a new year's resolution yet, I recommend this one. (Learning Haskell would be another good one, if you haven't yet.)

Just write something, anything, down in your journal every day.

Syndicated 2012-01-01 22:58:57 from see shy jo

solar year

I've been at the cabin, on solar power, for a year now. I have a year of data!

Everything went pretty well until last month. There was an April rainy spell where power felt slightly tight. Then over the summer, plenty of power, no need to conserve. The last month though had what seemed like weeks of continual grey clouds, where I never saw the sun.

high noon today

Of course, even on a sunny day in winter, it does not get far above the hills, and the peak production window is only a few hours. This bad combination had my battery power dipping below the 10 volts that I consider low, down to 9, and even to 8 volts.

I use kerosine lamps in the winter. (I prefer the light anway.) I've also started unplugging my Thecus server at night to conserve power, meaning no internet late or early. For four or so nights, I had no power to run even my laptop after sunset. On one notable day, there was no power even in the daytime.

Even when it turned sunny again, I found that the batteries would seem to charge to 12 volts during the day, but then precipitously drop to 10 and 9 volts at night. I think the problem was not damaged batteries, but that these Nicads charge most efficiently above 12 volts (14 volts is best), and there was never enough power saved up to get them full enough that they could charge really efficiently.

So, I reluctantly spent three days away this week, to let the batteries soak up sun and recover. It seems to have worked; they've been holding a 12 volt charge overnight again.

Syndicated 2011-12-31 18:15:55 from see shy jo

a Github survey

The great thing about git and other distributed version control systems is that once you clone (or fork) a repository, you have all the data. You don't have to trust that Github will preserve it; everyone who develops the project is a backup.

Github carries this principle quite far amoung the features they provide. But not all the way. Today I have surveyed their features, and where the data for each is stored.

  • source code -- in git, of course!
  • user and project pages and wiki -- in git
  • gists -- in git
  • issues -- in a database accessible by an API
  • notes on commits -- in a database accessible by an API
  • relationships between repos (who forked what, pull requests) -- in a database accessible by an API
  • your account details and activity -- in a database, accessible by you via an API
  • list of all projects and users -- in a closed database (AFAIK)

The two that really stand out are the issues and notes not being stored in git. This means that, if a project uses github, it gets locked into github to a degree. The records of bugs and features, all the planning, and communication, is locked away in a database where it cannot be cloned, where every developer is not a backup.

Github's intent here is not to control this data to lock you in (to the extent they want to lock you in, they do that by providing a proprietary UI that people rave about); it was probably only expedient to use some sort of database, rather than git, when implementing these features.

They should automatically produce git repository branches containing a project's issues, and notes, based on the contents of their database. (For notes, git notes is the obviously right storage location.) Along with ensuring every developer checkout is a backup, this would allow accessing that data while offline, which is one of the reasons we use distributed version control.

The lack of a global list of projects is problimatic in a more global sense. It means that we can't make a backup of all the (public) repositories in Github (assuming that we had the bandwidth and storage to do it). I recently backed up all the repositories on Berlios.de, when it looked to be shutting down; this was only possible because they allowed enumerating them all.

People at The Internet Archive say that their archival coverage of free software is actually quite bad. We trust our version control systems to save our free software data, but while this works individually, it will result in data loss globally over time. I'd encourage Github to help the Internet Archive improve their collections by donating periodic snapshots of their public git repositories to the Archive. You're located in the same city, 5 miles apart; they have lots of hard drives (though less right now during the shortage than usual); this should be pretty easy to do.

Full disclosure: Github has bought me dinner and seemed like stand-up guys to me.

Syndicated 2011-12-27 17:38:45 from see shy jo

roundtrip latency from a cabin with dialup in 2011

alt="imagine an xkcd-style infographic here"

0 seconds

  • peace and quiet
  • full history of all my projects (git repos)
  • my blog
  • email

0.5 seconds

  • chatting on IRC
  • searching through all email received since 1994
  • music
  • cached web pages

5 seconds

  • ssh to a server
  • search the web
  • lwn, hacker news, reddit, metafilter, and other web aggregators

10 seconds

  • resuming laptop from sleep and waiting for network-manager
  • view an unnecessarily pastebinned scrap of text
  • access local Debian mirror
  • looking up a typical bug report

20 seconds

  • click on a typical link from a web aggregator
  • an hour of video pulled from a USB drive with git-annex

2 minutes

  • downloading new email
  • an increasing number of websites that force https (average of 3 reloads needed due to timeouts)

5 minutes

  • viewing a single file, bug report, or merge request on github
  • cloning the full content of a typical not too large git repo
  • retriving data from archival drives via git-annex
  • going offline and making a phone call
  • apt-get update (thanks aj, for the pdiffs)
  • viewing a single a twitter page (megabytes of crud and #! redirections)

10 minutes

  • entering a state of flow while programming
  • boingboing.net (with all the pretty pictures)
  • my mailbox (after a nice walk down a long driveway)

22 minutes

  • milk and eggs
  • a swim in the river

30 minutes

  • broadband internet access
  • someone else who knows what linux is

32 minutes

  • an hour of video pulled from my server with git-annex (includes travel time to broadband access point)

70 minutes

  • a halfway decent but slightly overpriced grocery store
  • a produce stand
  • a coffee shop

180 minutes

  • family
  • a bakery with real bread

300 minutes

  • downloading a typical podcast

Syndicated 2011-11-23 21:44:04 from see shy jo

the Engelbart demo

Just watched the whole Douglas Engelbart demo from 1968. Somehow I'd only heard of this as the first demo of the computer mouse, and only seen a brief clip on youtube. All three 30-minute reels of the film are available online, and well worth a watch in full.

The mouse is the least of it, the demo includes an outlining text editor, model-view-controller, hypertext, wiki, domain specific programming languages, a precurser to email, bug tracking, version control(?), a chorded keyboard. (Ok, that last one didn't really take off.) Probably a dozen other things I've forgotten. All in a single interface, and all before I was born.

Just like any tech demo, there are fumbles and mistakes, which is reassuring to anyone who has tried to give a tech demo.

There's also the awesome crazy hack shown here. They could only afford these tiny, round CRTs, so they pointed a television camera at it, and the camera image was piped to their television console. (So add KVM switch to the list of firsts!) The demo was done in San Fransisco, with the computer system remote in Palo Alto, so in this image you see the text on the CRT overlaid with the video from the camera.

Engelbart points out that the delay this added to the system acts as a short-term memory that filtered out flicker in the original display (and made the mouse have a mouse trail). To me it gives the whole demo a unique quality, as if it were underwater.

Despite the piping around of audio and video signals, and the multiuser system, the glaring thing missing from the demo that we have these days is networking. Although there is this amusing bit at the end where they compile a regular expression and then apply it, in order to search for documents containing certain terms, and end up with a hyperlinked list of 10 results, ordered by relevance. Yes, think Google.

Syndicated 2011-11-03 00:14:19 from see shy jo

two random thoughts about bugs

First thought is this: A bug's likelyhood of ever being fixed decays with time, starting when I first read it. If I have to read it a second time, the bug has already become more complex, since something prevented me from just fixing it the first time. If more information has to be added to the bug, that makes it yet more complex. If there is an argument in the bug about whether it is a bug, or how to fix it, just revisiting the bug at a later date can become more expensive than it's worth. Much of what is involved in filing good and effective bug reports are obvious corollaries of this. It also follows that it's best to either fix, or at least plan how to fix a bug immediatly upon reading it.

Second thought is about "wontfix". A bug submitter and the developer responsible for the bug see this state in very different ways, but the name hides what it really means, which is that there is a meta-bug affecting either the bug submitter, the developer, or both. Once you realize this, wontfix bugs, from either side, become a bit personally insulting. They also quickly decay to uselessness (see first thought), and then just lurk there wasting the developer's time in various ways. Bug tracking systems should not provide a "wontfix" state; if they want to track meta-bugs they should provide a way to reassign such a bug to some other party who can actually resolve such a meta-bug.

Syndicated 2011-10-29 18:08:33 from see shy jo


I attended the Git Together earlier this week. I was tenative about this, since I'm not really much of a git developer; all my git work is building stuff on top of it. It turned out great though.

At first it seemed like one of those parties where you don't know anyone. But then I got to reconnect with Avery Pennarun for the first time since DebConf 2, and got to know Jonathan Nieder better, and it was also nice to see Jelmer Vernooij. And the core developers were also very welcoming. Junio Hamano knew of my work (and I am in awe of his), and Jeff King thinks my take on SHA1 security issues has value, and has been expanding on it. Shawn Pearce managed the unconference subtly and well. Lots of very smart people. At one point I found myself accross the table from Android's lead developer.

I was very happy that everything I think needs improvement in git was discussed during the unconference:

  • big files: My postit suggesting this got more checks than most anything else, and I briefly presented git-annex at the start of a session on general scalability -- on its 1-year anniversary. Some ideas for improved hooks that git-annex and other tools could use are developing. Better scalability to lots of files and more efficient index files were also discussed.
  • git as a filesystem: There was a consensus that gone are the days when git was just about managing source code. (I remember being told on #git before I wrote etckeeper, that no, git should not be used for that..)
  • submodules: I was astounded that they're now considering supporting "floating" submodules, which would track the head of a branch, rather that the specific rev committed in the superproject. Many other problems that have kept me from ever trying submodules are also being worked on. This seems unlikely to replace mr, but who knows -- at least getting rid of repo is a goal.
  • SHA1 security was discussed for quite a long while, long enough that I felt a bit guilty for bringing it up, but it was an interesting and fruitful discussion. I went in thinking that the checksum basically has to be parameterized, but they have some good reasons not to do that, and some other good ideas, although what to do and when best to do it is still open for discussion. Signed commits are certianly coming soon. Also this amazing patch was developed.
  • Metadata storage was briefly discussed, but nobody seemed sure how to deal with it. Ideas floated included a metastore like tool that uses mergeable files, or storing metadata in some sort of notes-like separate branch.

Syndicated 2011-10-29 00:56:26 from see shy jo

california postcard

Visiting California this week and having a great time. Experienced my first earthquake; visited the Noisebridge hackspace with Seth and Mako; and yesterday went up to Point Reyes and flew a kite from cliffs over Drake's Bay.

Up there even the cows have a view.

Tomorrow, off to Google for the GitTogether.

Syndicated 2011-10-23 19:57:47 from see shy jo

borrowed dogs

I've never had a dog of my own since I grew up, but there have always been dogs in my life.

Calypso and Percy

Calypso was dropped off at Wortroot soon after I moved in, and was my borrowed dog for years. And she remained out there, spending a good decade with run of the woods, fields and streams, a good doggy life. She got old and feeble, spent winters by the fireplace, and finally it was too much for her. I'll miss her, the best dog I've known.

Recently I've caught glimpses of a dog lurking in the distance here at the Hollow. When I noticed it was sleeping on the roof of the battery box, I realized it was probably one of the dogs that used to live here but were given away last year. Exchanged email with the likely owners, now in Sudan, and they tell me her name is Domino, and she must have run away home.

So I've been putting out food for Domino this week, and yesterday she came close enough to be petted. Medium sized and white, her name is for a black mask extending from eyes to ears. Although currently skittish, she seems basically a good, calm dog.

Syndicated 2011-09-28 22:37:57 from see shy jo

happy haskell hacker

There are certian things haskell is very good at, and I had the pleasure of using it for one such thing yesterday. I wanted to support expressions like find(1) does, in git-annex. Something like:

  git-annex drop --not --exclude '*.mp3' --and \
    -\( --in usbdrive --or --in archive -\) --and \
    --not --copies 3

So, parens and booleans and some kind of domain-specific operations. It's easy to build a data structure in haskell that can contain this sort of expression.

{- A Token can either be a single word, or an Operation of an arbitrary type. -}
data Token op = Token String | Operation op
        deriving (Show, Eq)

data Matcher op = Any
        | And (Matcher op) (Matcher op)
        | Or (Matcher op) (Matcher op)
        | Not (Matcher op)
        | Op op
        deriving (Show, Eq)

(The op could just be a String, but is parameterized for reasons we'll see later.)

As command-line options, the expression is already tokenised, so all I needed to do to parse it is consume a list of the Tokens. The only mildly tricky thing is handling the parens right -- I chose to not make it worry if there were too many, or too few closing parens.

generate :: [Token op] -> Matcher op
generate ts = generate' Any ts
generate' :: Matcher op -> [Token op] -> Matcher op
generate' m [] = m
generate' m ts = uncurry generate' $ consume m ts

consume :: Matcher op -> [Token op] -> (Matcher op, [Token op])
consume m [] = (m, [])
consume m ((Operation o):ts) = (m `And` Op o, ts)
consume m ((Token t):ts)
        | t == "and" = cont $ m `And` next
        | t == "or" = cont $ m `Or` next
        | t == "not" = cont $ m `And` (Not next)
        | t == "(" = let (n, r) = consume next rest in (m `And` n, r)
        | t == ")" = (m, ts)
        | otherwise = error $ "unknown token " ++ t
                (next, rest) = consume Any ts
                cont v = (v, rest)

Once a Matcher is built, it can be used to check if things match the expression the user supplied. This next bit of code almost writes itself.

{- Checks if a Matcher matches, using a supplied function to check
 - the value of Operations. -}
match :: (op -> v -> Bool) -> Matcher op -> v -> Bool
match a m v = go m
                go Any = True
                go (And m1 m2) = go m1 && go m2
                go (Or m1 m2) = go m1 || go m2
                go (Not m1) = not (go m1)
                go (Op o) = a o v

And that's it! This is all nearly completly generic and could be used for a great many things that need support for this sort of expression, as long as they can be checked in pure code.

A trivial example:

  *Utility.Matcher> let m = generate [Operation True, Token "and", Token "(", Operation False, Token "or", Token, "not", Operation False, Token ")"]
*Utility.Matcher> match (const . id) m undefined 

For my case though, I needed to run some IO actions to check if expressions about files were true. This is where I was very pleased to see a monadic version of match could easily be built.

{- Runs a monadic Matcher, where Operations are actions in the monad. -}
matchM :: Monad m => Matcher (v -> m Bool) -> v -> m Bool
matchM m v = go m
                go Any = return True
                go (And m1 m2) = liftM2 (&&) (go m1) (go m2)
                go (Or m1 m2) =  liftM2 (||) (go m1) (go m2)
                go (Not m1) = liftM not (go m1)
                go (Op o) = o v

With this and about 100 lines of code to implement specific tests like --copies and --in, git-annex now supports the example at the top.

Just for comparison, find(1) has thousands of lines of C code to build a similar parse tree from the command line parameters and run it. Although I was surprised to see that it optimises expressions by eg, reordering cheaper tests first.

The last time I wrote this kind of thing was in perl, and there the natural way was to carefully translate the expression into perl code, which was then evaled. Meaning the code was susceptible to security holes.

Anyway, this is nothing that has not been done a hundred times in haskell before, but it's very nice that it makes it so clean, easy, and generic.

Syndicated 2011-09-19 23:44:31 from see shy jo

467 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!