Older blog entries for joey (starting at number 527)

DIY crowdfunding and bitcoin

Well, my git-annex crowdfunding campaign is half way to its August 15th conclusion. So far it's raised more than five times what I hoped it would. I wish I could say I'm like some canny NASA engineer who intentionally sets low expectations for their Mars rover, but in both the previous kickstarter and this campaign I've really had no idea how far it'd go. I'm glad that I'll be working on git-annex for another year.

I was particularly unsure if it'd be successful to move off Kickstarter. During the git-annex assistant Kickstarter campaign, I saw many small contributions from people who learned of it due to it being a successfully funded project, a staff pick, etc. Losing that easy network effect is a gamble.

So far I've had only half the number of contributors that I got on Kickstarter. I've basically missed out entirely on the $5 level casual contributors. On the other hand, my backers have generally been more generous (and some have been exceedingly generous). And I've avoided rewards that will cost much money, so I may end up in the same ballpark funding level in the end!

Incidentially, I'm really enjoying getting in touch to let people know when I make their sponsored commits. There's still time to sponsor one of your own ;)


I also was curious to experiment with Bitcoin in this campaign. Partly because Paypal isn't available everywhere internationally, and takes really obnoxious percentages of transactions (though probably not as bad as Kickstarter taking its percentage followed by Amazon payments taking its percentage..) and partly because there seem to be interesting possibilities for supporting free software with Bitcoin. (Especially if any of the microtransactions on top of Bitcoin take off.)

So far 5% of backers have used Bitcoin. It's been quite strange to actually have significant amounts of bitcoins in my wallet. Wordpress has had 94 bitcoin payments over 9 months since starting accepting them. I've had 47 payments in the two weeks my campaign has run so far. Wow!

Most of the bitcoin payments have come in via Coinbase (a few people have found my direct payment address), but of those very few were using bitcoin purchased on Coinbase. Most are probably transfers of bitcoin they already had, or perhaps bitcoin purchased on other sites.

The one technical issue I've had with using bitcoin is that Coinbase has not provided details about who sent most of the donations. Probably some of them are intentionally anonymous, but I suspect Coinbase's interface to claim incoming bitcoin transactions failed for some of them. (If you donated bitcoin and want to actually get a reward, please email me.)

By the way, I'm converting most of the bitcoins back to USD pretty quickly. I'm not interested in speculating on currency exchange rates with money that has been donated so I can accomplish a particular task..


I put up the campaign website without any means in place to handle updating it. This is because I never automate anything until I've done it at least 10 times by hand. ;) After the first trickle of donations became a flood, I quickly realized I needed at least something to handle keeping the numbers straight.

What I whipped up in an hour of coding is a system where I enter incoming payments into a hledger file and a small haskell program parses that and writes out various files that are included into the website. Amusingly the percentage calculation and display code was copied from git-annex, so part of git-annex is helping run its own fundraising campaign. The campaign video is itself hosted in a public git-annex repository, come to think of it.

The rest of the site is built using ikiwiki. Given that it's hosted at Branchable, this is a high level of dogfooding and DIY. There are certianly better crowdfunding platforms, but all I miss in this one is automated transaction entry. And I have total flexability, double entry accounting, and a powerful static website generator that handled being on the top of Hacker News without a sweat. Oh, and some money. What's not to like?

Syndicated 2013-08-02 06:22:09 from see shy jo

git-annex as a podcatcher

As a Sunday diversion, I wrote 150 lines of code and turned git-annex into a podcatcher!

I've been using hpodder, a podcatcher written in Haskell. But John Goerzen hasn't had time to maintain it, and it fell out of Debian a while ago. John suggested I maintain it, but I have not found the time, and it'd be another mass of code for me to learn and worry about.

Also, hpodder has some misfeatures common to the "podcatcher" genre:

  • It has some kind of database of feeds and what files have been downloaded from them. And this requires an interface around adding feeds, removing feeds, changing urls, etc.
  • Due to it using a database, there's no particularly good way to run it on the same feeds on multiple computers and sync the results in some way.
  • It doesn't use git annex addurl to register the url where a file came from, so when I check files in with git-annex after the fact they're missing that useful metadata and I can't just git annex get them to re-download them from the podcast.

So, here's a rethink of the podcatcher genre:

  cd annex; git annex importfeed http://url/to/podcast http://another/podcast

There is no database of feeds at all. Although of course you can check a list of them right into the same git repository, next to the files it adds. git-annex already keeps track of urls associated with content, so it reuses that to know which urls it's already downloaded. So when you're done with a podcast file and delete it, it won't download it again.

This is a podcatcher that doesn't need to actually download podcast files! With --fast, it only records the existence of files in git, so git annex get will download them from the web (or perhaps from a nearer location that git-annex knows about).

Took just 3 hours to write, and that's including full control over the filenames it uses (--template='${feedtitle)/${itemtitle}${extension}'), and automatic resuming of interrupted downloads. Most of what I needed was already available in git-annex's utility libraries or Hackage.

Technically, the only part of this that was hard at all was efficiently querying the git repository for a list of all known urls. I found a pretty fast way to do it, but might add a local cache file later on.

Syndicated 2013-07-28 21:03:02 from see shy jo

guesting on GitMinutes

Last Friday, I spent and hour and a half clamping a landline phone to the side of my head, while also wearing a headset. I was recording an interview on the GitMinutes podcast about git-annex.

I've been listening to GitMinutes for a while, ever since I heard git-annex was mentioned on it. Actually, I think it's come up in 4 or 5 interviews on the podcast. Most notably with core git dev Peff King, who had some interesting things to say, for sure. I responded to that in my interview, and we covered quite a wide amount of stuff in reasonable depth in just over an hour. Thomas is quite a good host and great at drawing stuff out, and it's nice to not need to worry about going into too much technical depth. (Although I didn't get a chance to explain the automatic union merging used to maintain the git-annex branch.)

This is the first podcast I've been in, and I've always worried about audio quality if I was in one. That is, I'd want it to be really good, and probably end up annoying the host. ;) For this one, we settled on using my land line call, which went through some Skype thing to get the Europe, and mixing in a local recording I made with a not too great headset. I think the result is pretty good, considering.

You can listen to the whole thing here, if you dare! (1 hour 8 minutes) http://episodes.gitminutes.com/2013/07/gitminutes-16-joey-hess-on-git-annex.html

(Special bonus guest: The songbird that lives on my porch.)

git-annex fundraising campaign update: Initial goal reached in a mere seven hours. I will be developing git-annex fulltime for at least the next three months! Gone to stretch goal.

Also it made the top of Hacker News: thread

Syndicated 2013-07-15 07:26:08 from see shy jo

New git-annex crowdfunding campaign

Having reached the end of my Kickstarter funded year working on the git-annex assistant, I've decided to try one more crowdfunding campaign, to see if I can get funded to work on it a little while longer. I went back and forth on this for a while. The Kickstarter funded development was extremely successful (one of the most productive years of my life). I certianly want to work on git-annex more, and have lots more stuff to do, particularly around security and encryption. On the other hand, it's hard to frame ongoing development as a normal Kickstarter campaign to start something new.

Anyway, I've decided to go ahead and try it, and not do it through Kickstarter this time. So I have my own website set up and accepting donations, and hopefully I'll make enough to spend a few more months working on git-annex.


... And if not, I'll probably spend a few months working on git-annex part time, while looking for paying work with the rest of my time.

By the way, I'm taking payments in both US Dollars (via Paypal) and Bitcoin (via Coinbase). Can't wait to see how this works out!

Syndicated 2013-07-14 23:50:43 from see shy jo

git annex and my mom

[I'm encouraging git-annex users to post their success stories, and this one is my own.]

I set up git-annex on my mom's and sisters' computers a couple of months ago. I noticed this was the first software I've written that I didn't have to really explain how to use. All I told them was, put files in this folder, and the rest of us will be able to see them. Don't put anything too big in there, or anything you don't want others to see.

I paired the computers using XMPP, and set up an encrypted transfer repository using a free account rsync.net gave me for beta testing. I also added a repository on my server, which made things more robust. (XMPP has since improved, but it's still a good idea to have a git repository to suppliment XMPP.) I also have two removable drives that are used to back up our files.

This was all set up using the webapp. And adding a computer takes just a couple of minutes that way. I set it up at my sister's in a spare moment during a visit, and it all just worked.

Our shared git annex contains a couple of hundred files, and is a couple of gigabytes in size. And growing pretty fast as we find things we want to share. Mostly photos and videos so far but I won't be surprised to find poems and books pop up in there from the family's poets and authors. And it'll grow further as I add people who've so far been left out.

Coming home from a week at the beach with my grand nephew and niece, was the first time I really used git-annex without thinking about it. Collapsed on a hotel bed, I plugged in my camera and loaded in the trip's photos. Only to see the hotel wifi cost extra. Urk, no! Later, in the lobby, I found an open wifi network, and watched it automatically sync up.


By the time I was home, the video of cute kids playing weathermen and reporting on our near miss by a tropical storm had been enjoyed by the folks who didn't make that family gathering.

Syndicated 2013-06-25 21:33:29 from see shy jo

little disasters

Interesting times.. While the big disasters are ongoing, little ones have been spicing up my life lately.

A pleasant week by the beach ended with a tropical storm passing over the beach house. I've never experienced this before, and though Andrea was diminished by passing over land, it was still more wind than I've ever seen. I love wind, and this was thrilling, right on the edge of danger but not quite there. At least, if you have sense to stay out of the water. Leaving the beach, I heard of someone who tried to go surfing that day, and drowned.

The night before last, I was startled to find nearly an inch of water seeping up from underneath the tile floor of the kitchen. Probably it has something to do with the pressure tank pumping system, which was repaired while I was away, and means I actually have indoor running water here. (Overrated.) This saw me scrambling to close every water valve, and out with a flashlight at 2 am closing the cutoff at the 1000 gallon water reservoir before it all drained into the house. While sopping up dozens of gallons of water from the floor at 3 am probably doesn't sound like fun, I found myself going through the motions elatedly.. Because this means I finally am coming to understand the source of the damp that infests the most earth-sheltered corner of this house. It's not condensation. It's bad plumbing!

Then yesterday, I went out to try a dip in the river, stopped by the neighborhood eatery and bait shop, and ended up sitting out on the back deck eating ribs and listening to a band with "possum playboys" in their name (which makes the full name fairly irrelevant), while looking out over the river and the old-timey green metal bridge. Which was unexpected fun, and the kind of thing you have to take in when it happens, but getting stuck in a newly installed hole in my driveway was not. My car was spinning, and I gave up and called it a night.

Here's the thing. I could feel my brain working on this stupid "underpowered car is stuck in a small rut" issue all night long. Same mental pathways activating that chew over bugs and design issues. Got up this morning with a set of plans and contingency plans all ready to go. The first one, of jacking it up and putting something under the tire was stymied; it seems I am missing a jack. But the second, of digging out all around the tire, and then filling in with gravel and cat litter (a tip from some offroading website I blearily surfed last night), and then riding the gas while releasing the bake, worked great.

All of which is to say, bring em on! But I still prefer my disasters in the form of software bugs.

Syndicated 2013-06-16 16:25:56 from see shy jo

faster dh

With wheezy released, the floodgates are opened on a lot of debhelper changes that have been piling up. Most of these should be pretty minor, but I released one yesterday that will affect all users of dh. Hopefully in a good way.

I made dh smarter about selecting which debhelper commands it runs. It can tell when a package does not use the stuff done by a particular command, and skips running the command entirely.

So the debian/rules binary of a package using dh will now often look like this:

dh binary

Which is pretty close to the optimal hand-crafted debian/rules file (and just about as fast, too). But with the benefit that if you later add, say, cron job files, dh_installcron will automatically start being run too.

Hopefully this will not result in any behavior changes, other than packages building faster and with less noise. If there is a bug it'll probably be something missing in the specification of when a command needs to be run.

Beyond speed, I hope that this will help to lower the bar to adding new commands to debhelper, and to the default dh sequences. Before, every such new command slowed things down and was annoying. Now more special-purpose commands won't get in the way of packages that don't need them.

The way this works is that debhelper commands can include a "PROMISE" directive. An example from dh_installexamples


Mostly this specifies the files in debian/ that are used by the command, and whose presence triggers the command to run. There is also a syntax to specify items that can be present in the package build directory to trigger the command to run.

(Unfortunatly, dh_perl can't use this. There's no good way to specify when dh_perl needs to run, short of doing nearly as much work as dh_perl would do when run. Oh well.)

Note that third-party dh_ commands can include these directives too, if that makes sense.

I'm happy how this turned out, but I could be happier about the implementation. The PROMISE directives need to be maintained along with the code of the command. If another config file is added, they obviously must be updated. Other changes to a command can invalidate the PROMISE directive, and cause unexpected bugs.

What would be ideal is to not repeat the inputs of the command in these directives, but instead write the command such that its inputs can be automatically extracted. I played around with some code like this:

$behavior = main_behavior("docs tmp(usr/share/doc/)", sub {
       my $package=shift;
       my $docs=shift;
       my $docdir=shift;

       install($docs, $docdir);

But refactoring all debhelper commands to be written in this style would be a big job. And I was not happy enough with the flexability and expressiveness of this to continue with it.

I can however, dream about what this would look like if debhelper were written in Haskell. Then I would have a Debhelper a monad, within which each command executes.

main = runDebhelperIO installDocs

installDocs :: Monad a => Debhelper a
installDocs = do
    docs <- configFile "docs"
    docdir <- tmpDir "usr/share/doc"
    lift $ install docs docdir

To run the command, runDebhelperIO would loop over all the packages and run the action, in the Debhelper IO monad.

But, this also allows making an examineDebhelper that takes an action like installDocs, and runs it in a Debhelper Writer monad. That would accumulate a list of all the inputs used by the action, and return it, without performing any side effecting IO actions.

It's been 15 years since I last changed the language debhelper was written in. I did that for less gains than this, really. (The issue back then was that shell getopt sucked.) IIRC it was not very hard, and only took a few days. Still, I don't really anticipate reimplementing debhelper in Haskell any time soon.

For one thing, individual Haskell binaries are quite large, statically linking all Haskell libraries they use, and so the installed size of debhelper would go up quite a bit. I hope that forthcoming changes will move things toward dynamically linked haskell libraries, and make it more appealing for projects that involve a lot of small commands.

So, just a thought experiment for now..

Syndicated 2013-05-08 19:18:06 from see shy jo

the #newinwheezy game: STM

Debian wheezy includes a bunch of excellent new Haskell libraries. I'm going to highlight one that should be interesting to non-Haskell developers, who may have struggled with writing non-buggy threaded programs in other languages: libghc-stm-dev

I had given up on most threaded programs before learning about Software Transactional Memory. Writing a correct threaded program, when multiple threads needed to modify the same state, needed careful uses of locking. In my experience, locking is almost never gotten right the first time.

A real life example I encountered is an app that displays a queue of files to be downloaded, and a list of files currently downloading. Starting a new download would go something like this:

startDownload = do
    file <- getQueuedFile
    push file currentDownLoads
    startDownloadThread file

But there's a point in time in which another thread, that refreshes the display, could then see an inconsistent state, where the file is in neither place. To fix this, you'd need to add lock checking around all accesses to the download queue and current downloads list, and lock them both here. (And be sure to always take the locks in the same order!)

But, it's worse than that, because how is getQueuedFile implemented? If the queue is empty, it needs to wait on a file being added. But how can a file be added the queue if we've locked it in order to perform this larger startDownload operation? What should be really simple code has become really complex juggling of locks.

STM deals with this in a much nicer way:

startDownload = atomically $ do
    file <- getQueuedFile
    push file currentDownLoads
    startDownloadThread file

Now the two operations are performed as one atomic transaction. It's not possible for any other thread to see an inconsistent state. No explicit locking is needed.

And, getQueuedFile can do whatever waiting it needs to, also using STM. This becomes part of the same larger transaction, in a way that cannot deadlock. It might be implemented like this:

getQueuedFile = atomically $
    if empty downloadQueue
        then retry
        else pop downloadQueue

When the queue is empty and this calls "retry", STM automatically waits for the queue to change before restarting the transaction. So this blocks until a file becomes available. It does it without any locking, and without you needing to tell explicitly tell STM what you're waiting on.

I find this beautiful, and am happier with it the more I use it in my code. Functions like getQueuedFile that run entirely in STM are building blocks that can be snapped together without worries to build more and more complex things.

For non-Haskell developers, STM is also available in Clojure, and work is underway to add it to gcc. There is also Hardware Transactional Memory coming, to speed it up. Although in my experience it's quite acceptably fast already.

However, as far as I know, all these other implementations of STM leave developers with a problem nearly as thorny as the original problem with locking. STM inherently works by detecting when a change is made that conflicts with another transaction, throwing away the change, and retrying. This means that code inside a STM transaction may run more than once.

Wait a second.. Doesn't that mean this code has a problem?

startDownload = atomically $ do
    file <- getQueuedFile
    push file currentDownLoads
    startDownloadThread file

Yes, this code is buggy! If the download thread is started, but then STM restarts the transaction, the same file will be downloaded repeatedly.

The C, Clojure, etc, STM implementations all let you write this buggy code.

Haskell, however, does not. The buggy code I showed won't even compile. The way it prevents this involves, well, monads. But essentially, it is able to use type checking to automatically determine that startDownloadThread is not safe to put in the middle of a STM transaction. You're left with no choice but to change things so the thread is only spawned once the transaction succeeds:

startDownload = do
    file <- atomically $ do
        f <- getQueuedFile
        push file currentDownLoads
        return f
    startDownloadThread file

If you appreciate that, you may want to check out some other #newinwheezy stuff like libghc-yesod-dev, a web framework that uses type checking to avoid broken urls, and also makes heavy use of threading, so is a great fit for using with STM. And libghc-quickcheck2-dev, which leverages the type system to automatically test properties about your program.

Syndicated 2013-05-02 16:54:04 from see shy jo

518 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!