Older blog entries for joey (starting at number 515)

Goodreads vs LibraryThing vs Free software

Four years ago I started using Goodreads to maintain the list of books I've read (which had lived in a flat text file for a decade+ before that).

Now it's been aquired by Amazon. I doubt it will survive in its current form for more than 2 years. Anyway, while Goodreads has been a quite good way to find what my friends are reading, I've been increasingly annoyed by the quality of its recommendations, and its paucity of other features I need. It really doesn't seem to help me keep up with new and interesting fiction at all, unless my friends happen to read it.

So I looked at LibraryThing. Actually, I seem to have looked at it several times before, since it had accounts named "joey", "joeyh", and "joeyhess" that were all mine. Which is what happens to me on sites that lack Openid or Browserid.

Digging a little deeper this time, I am finding its recommendations much better than Goodreads' -- although it seems to sometimes recommend books I've already read. And it has some nice features like tracking series, so you can easily tell when you've read all the books in a series or not. The analytics overall seem quite impressive. The UI is cluttered and it seems to take 5 clicks to add and rate a single book. It supports half stars.

Overall I get the feeling this was designed for a set of needs that doesn't quite match mine. For example, it seems it doesn't have a single database entry per book; instead each time I add a book, it seems to pull in data from primary sources (library of congress, Amazon cough) and treat this as a separate (but related) entry somehow. Weird. Perhaps this makes sense to say, librarians. I'm willing to adjust how I think about things if there's an underlying reason that can be grasped.

There's a quite interesting thread on LibraryThing where the founder says:

Don't say we should open-source the code. That would be a nightmare! And I have limited confidence in APIs. LibraryThing has the book geeks, but not so much the computers geeks.

I assume that the nightmare is that there would be dozens of clones of the site, all balkanized, with no data transfer, no federation between them.

Except, that's the current situation, as every Goodreads user who is now trying to use LibraryThing is discovering.

Before I ever started using Goodreads, I made sure it met my minimum criteria for putting my data into a proprietary silo: That I could get the data back out. I can, and have. LibraryThing can import it. But the import process loses data! And it's majorly clunky. If I want to continue using Goodreads due to its better UI, and get the data into LibraryThing, for its better analytics, I have to do periodic dumps and loads of CSV files with manual fixups.

This is why we have standards. This is why we're building federated social networks like status.net and the upcoming pump.io that can pass structured data between nodes transparently. It doesn't have to be a nightmare. It doesn't have to rely on proprietary APIs. We have the computer geeks.

Thing is, sites like GoodReads and LibraryThing need domain-specific knowledge, and communities to curate data, and stuff like that. Things that work well in a smallish company. (LibraryThing even has a business model that makes sense, yearly payments to store more books in it.)

With free software, it's much more appealing to sink the time we have into the most general-purpose solution we can. Why build a LibraryThing when we could build something that tracks not only books but movies and music? Why build that when we could build a generic federated network for structured social data? And that's great, as infrastructure, but if that infrastructure is only used to build a succession of proprietary data silos, what was the point?

So, could some computer & book geeks please build a free software alternative to these things, focused on books, that federates using any of the fine APIs we have available? Bear in mind that there is already a nice start at a comprehensive collection of book data in the Open Library. I'd happily contribute to a crowd funded project doing this.

Syndicated 2013-03-30 16:20:18 from see shy jo

difficulties in backing up live git repositories

But you can’t just tar.gz up the bare repositories on the server and hope for the best. Maybe a given repository will be in a valid state; maybe it won’t.

-- Jeff Mitchell in a followup to the recent KDE near git disaster

This was a surprising statement to me. I seem to remember that one of (many) selling points for git talked about back in the day was that it avoided the problem that making a simple cp (or backup) of a repository could lead to an inconsistent result. A problem that subversion repositories had, and required annoying commands to work around. (svnadmin $something -- iirc the backend FSFS fixed or avoided most of this issue.)

This prompted me to check how I handle it in ikiwiki-hosting. I must have anticipated a problem at some point, since ikisite backup takes care to lock the git repository in a way that prevents eg, incoming pushes while a backup is running. Probably, like the KDE developers, I was simply exercising reasonable caution.

The following analysis has probably been written up before (train; limited network availability; can't check), but here are some scenarios to consider:

  • A non-bare repository has two parts that can clearly get out of sync during a backup: The work tree and the .git directory.

    • The .git directory will likely be backed up first, since getdirent will typically return it first, since it gets created first . If a change is made to the work tree during that backup, and committed while the work tree is being backed up, the backup won't include that commit -- which is no particular problem and would not be surprising upon restore. Make commit again and get on with life.

    • However, if (part of) the work tree is backed up before .git, then any changes that are committed to git during the backup would not be reflected in the restored work tree, and git diff would show a reversion of those changes. After restore, care would need to be taken to reset the work tree (without losing any legitimate uncommitted changes).

  • A non-bare repository can also become broken in other ways if just the wrong state is snapshotted. For example, if a commit is in progress during a backup, .git/index.lock may exist, and prevent future commits from happening, until it's deleted. These problems can also occur if the machine dies at just the right time during a commit. Git tells you how to recover. (git could go further to avoid these problems than it does; for example it could check if .git/index.lock is actually locked using fcntl. Something I do in git-annex to make the .git/annex/index.lock file crash safe.)

  • A bare repository could be receiving a push (or a non-bare repository a pull) while the backup occurs. These are fairly similar cases, with the main difference being that a non-bare repository has the reflog, which can be used to recover from some inconsist states that could be backed up. Let's concentrate on pushes to bare repositories.

    • A pack could be in the process of being uploaded during a backup. The KDE developers apparently worried that this could result in a corrupt or inconsistent repository, but TTBOMK it cannot; git transfers the pack to a temp file and atomically renames it into place once the transfer is complete. A backup may include an excess temp file, but this can also happen if the system goes down while a push is in progress. Git cleans these things up.

    • A push first transfers the .git/objects, and then updates .git/refs. A backup might first back up the refs, and then the objects. In this case, it would lose the record that refs were pushed. After being restored, any push from another repository would update the refs, even using the objects that did get backed up. So git recovers from this, and it's not really a concern.

    • Perhaps a backup chooses to first back up the objects, and then the refs. In this case, it could back up a newly changed ref, without having backed up the referenced objects (because they arrived after the backup had finished with the objects). When this happens, your bare repository is inconsistent; you have to somehow hunt down the correct ref for the objects you do have.

      This is a bad failure mode. git could improve this, perhaps, by maintaining a reflog for bare repositories, which, in my limited testing, it does not do.

  • A "backup" of a git repository can consist of other clones of it. Which do not include .git/hooks/ scripts, .git/config settings, and potentially other valuable information, that strangely, we do not check into revision control despite having this nice revision control system available. This is the most likely failure mode with "git backups". :P

I think that it's important git support naive backups of git repositories as well as possible, because that's probably how most backups of git repositories are made. We don't all have time to carefully tune our backup systems to do something special around our git repositories to ensure we get them in a consistent state like the KDE project did, and as their experience shows, even if we do it, we can easily introduce other, unanticipated problems.

Can anyone else think of any other failure modes like these, or find holes in my slightly rushed analysis?


PS: git-annex is itself entirely crash-safe, to the best of my abilities, and also safe for naive backups. But inherits any problems with naive backups of git repositories.

Syndicated 2013-03-25 01:39:49 from see shy jo

Kickstarter rewards wrap-up

I finished delivering all my Kickstarter rewards at the end of the year. This is an overview of how that went, both financially and in general.

my Kickstarter pie (not including taxes)

While the Kickstarter was under way, several friends warned me that I might end up spending a lot of the money for rewards, or even shipping, and come out a loser. It's happened to some people, but I avoided it. Most of the pie went to its intended purpose.

USB key arrives in Amsterdam

I kept shipping cost low by shipping everything by US postal service, including Air Mail for international shipping. This was particularly important for the stickers (which cost $1.05 to ship internationally). But I also shipped USB keys in regular mail envelopes, protected by bubble wrap, which worked very well and avoided the bother of shipping packages. The USPS will be annoyed at you for a rigid letter and add a non-machinable surcharge, but it's still a nice savings.

expenses

I spent more on rewards than on transaction fees, but the fees are still pretty large. Being dinged a second time by Amazon is the worst part. I have not been able to work out exactly what formula Kickstarter uses to determine its fee per pledge. It does not seem to be a simple percentage of the pledge. For example, they seem to have charged $0.25 per $10 pledge, but $25 for a $500 pledge. I wanted to solve this, but I'd have to match up all the pledges and fees manually to do it.

gross income by reward type

This chart is slightly innacurrate, because it puts any money pledged, beyond the amount needed to get a reward, into the "intangibles" category, despite the reward being probably responsible for that money being pledged.

(The intangibles also includes people who did not ask for a reward, and several categories of rewards not involving shipping matter around.)

But, the surprise for me is how large a peice the T-shirts are responsible for. It was my least favorite reward, and a low volume one, but I made out pretty well on it. However, I'd still try to avoid putting T-shirts on Kickstarter again. It's hard to do a good design (I didn't, really); they're expensive, and were by far the most annoying thing to ship. Also, I was not happy with the countries Cafe Press sourced their shirts from; I've been to Honduras and talked with people who have relatives in las machinas.

gross and net income by reward type (excluding shipping)

In contrast, the stickers had an amazing margin; they're so inexpensive to print that I printed up two kinds and included multiple with every other reward I mailed. I still have hundreds left over, too.. All the online print shops I tried have very annoying interfaces to upload artwork though. I had to do quite a bit of math to render TIFF files with appropriate DPI and margins.

The USB keys were my favorite reward. I got them from USB Memory Direct, who gave me quite a nice deal. I was very happy that I was able to send them a SVG file of my artwork, so I didn't need to worry about lacking resolution for the laser engraving. And it came out looking great.

The best part was when their sales guy Mike actually did a minor alteration of the artwork, to better fit on the key, when I, being overloaded with Kickstarter stuff asked him too. A bit above and beyond.

There was an issue with their Chinese manufacturer's quality control of the 16 gb drives, but they were willing to send me replacement for all the ones I found problems with.


All told I spent probably 3 full days stuffing and shipping envelopes, and probably spent a week working on Kickstarter reward fullfillment. As work-related overhead goes, that's not bad. Maybe someone considering a Kickstarter will find this information useful somehow. Oh well, back to work. :)

Syndicated 2013-02-21 01:00:30 from see shy jo

Sydney nostalgia

Sydney opera house viewed from a ferry

Last Saturday, when the bus from Canberra pulled into Sydney's central station, I found myself feeling curiously nostalgic for this city, and particularly this bustling and somewhat seedy[1] neighborhood of Haymarket and Redfern.

glowing angels in Kimber lane

I only spent 5 days in Sydney, but living in a shared house there, walking up to Central every day, and returning to the outskirts of the Haymarket every evening to slurp noodles or other asian food, I got into a routine. And got a sense of the place that goes perhaps a bit beyond the tourist sights.

Manly beach panorama

Perhaps if I'd had more time I would have found a decent coffee shop that had both free wifi and abundant seating. They seem scarce in Sydney. I instead often got on the ferry to Manly when I wanted some sit down and code time.

Cliffside Protect Our Water Dragons sign
One time when I was exploring the headlands above Manly beach, I noticed this sign.
Then I ran into this guy. Click him for an amusing video.
lizard

Anyway, Sydney is on my very short list of cities I'd actually enjoy spending some more time in some day, along with San Francisco, Vancouver, Oslo, and London.


[1] Depending on what's inside all the "VIP lounges" and "Thai massage parlours" on every corner that I did not explore, perhaps thoroughly seedy?

Syndicated 2013-02-07 16:35:49 from see shy jo

LCA2013 wrapup

After 27 hours of travel, I'm finally back from my first Linux.conf.au.

This was a great conference! Some of my favorite highlights:

  • Out for dinner the first night, my whole table started spontaneously talking about Haskell, including details of IO management frameworks like conduit and pipes, and a new one that's waiting in the wings. That just doesn't happen in the real world. A lot of us continued to do Haskell stuff in the hallway track, although for some reason there were no Haskell talks on the official schedule. Maybe it's time for a Haskell miniconf at LCA?
  • Meeting Josh, Jamie, Sarah, Brendan, Craig, and others I've worked with online but never encountered IRL. Also reconnecting with old friends (some I'd not seen in 13 years) and finding new ones.
  • The speaker's dinner in a revolving restaurant overlooking Canberra. Leave it to a restaurant full of geeks to invent an asychronous communications medium in such a setting. (Notes attached to windows to be read and answered by tables as they rotated by.)
  • Meeting quite a lot of git-annex users, Kickstarter backers, and people interested in using git-annex. Thanks to everyone who came up to me for a chat.
  • The evening pickup board game sessions. Especially my wiping out three other Tsuro players at once by forcing them all to end on the same square. ;) These were where I felt the most at home in Australia.
  • Robert Llewellyn dropping in and mingling with fans of Red Dwarf and Scrapheap Challenge. One of the very few actors I could possibly fanboy on, and LCA not only somehow got him, but constructed an atmosphere that allowed for this photo of us.


My git-annex talk went ok, despite technical problems with the video output, which the video team did a great job of working around. The first demo went well, and the Q&A was excellent.

The demo of the git-annex assistant webapp was marred by, apparently, a bad build of the webapp -- which I ironically used rather than my usual development build because I assumed it'd be less likely to fail. At least I may have a way to reproduce that hang somewhat reliably now, so I can get on with debugging it. I will be redoing a demo of the webapp as a screencast.


Here are some of the best talks I attended. (I have quite a lot more queued up to watch still, and had to cut several due to space.) Click titles for videos, or browse all the videos.

  • Git for Ages 4 and Up
    Schwern has found a new, excellent way to explain git. I felt rather bad for using up a seat, especially once people were kicked out when the room was filled over capacity. But I enjoyed every minute of it. (Also has the best speaker intro ever. "Schwern once typed git pull --hard and pulled GitHub's sever room across the street.")

    BTW, I must confess: I left the red apple on teacher's desk.
  • Radia Perlman's keynote
    Network protocol design and poetry from one of the quiet heros of our field. I knew Spanning Tree Protocol was used in ethernet, but it just works, so like many I never paid much attention to it. This talk felt like the best lectures, where you're learning from a master on multiple levels at once. Well done work often becomes an unremarked part of the landscape, which I sometimes find unrewarding, so it was great to have Radia give some perspective on what that's like over the course of decades.

  • Lightning Talks
    A 90 second time limit really helps. Too many conferences have 5 minute talks, which is less exciting. If you still find them boring, skip forward to 13:50, where pjf does two talks in 90 seconds! (If a 20 second talk on depression is too .. manic, there's an encore at the end.)

  • The IPocalypse 20 months later
    A reality check, with real data. Very important stuff here. We need to work to avoid this worst case scenario, and we also need to design around it.

  • REPENT!!! FOR THE END OF THE UNIX EPOCH IS NIGH!!!
    Wildest talk beginning I've seen since RMS put the hard drive halo on his head. And to an important point: Any programs that deal with dates 25 years in the future already need to be fixed today to deal with the epoch rollover. This got me digging around in Haskell date libraries, to make sure they're ok.

  • Building Persona: Federated and Privacy Sensitive Identity for the Web
    This talk and some previous conversation with Francois have convinced me that Persona (AKA Browserid) has a design that can succeed. I will be adding Persona login support to ikiwiki.

  • Beyond Alt Text: What Every Project Should Know About Accessibility
    I missed the first half due to giving my talk, but the second half was full of rather a lot of excellent information, some of which I'd only guessed at before.

  • Git: Not Just for Source Code Anymore
    Good overview of the new ways to use git. Also kept giving examples from my body of work, which is some nice ego stroking, thanks Josh. ;-)

Syndicated 2013-02-05 03:44:45 from see shy jo

in Sydney

After arriving yesterday, and doing the obligatory jetlagged wander around Circular Quay to the Opera House, I crashed for 15 hours sleep and seem reasonably well over the jetlag now. I'm staying near Central Station, which is great because it's very convenient to transport and there's lots of cheap ethnic food.

view down my street

Today I took the ferry over to Manly Beach. I had to get in some in swimming of course, although today was cloudy and only in the 80's (F). That was ok, but the really nice bit was walking over to Shelly Beach and up along the sandstone cliffs near North Head.

Highlight was finding a small gotto high on the cliff, with some beautiful sandstone filigree overhead, that seemed carved by water (but probably really by wind), and contained a shady bench overlooking the Pacific. Great place to enjoy a lunch of crusty bread and salami if you're ever in the area.

Topped off a great day with a random trip to Neutral Bay (the D&Der in me liked the name, and why not, ferries are free) and a wander through the Royal Botanic Gardens.

I'm hoping to spend either Thursday or Friday at a co-working or hackerspace in Sydney, but I don't know where yet. Any locals have ideas?

Syndicated 2013-01-23 06:56:49 from see shy jo

overcast

My least favorite forecast, and we've been having a lot of it lately. Since winter began (Dec 21st), my PV array has produced only 102 amp-hours of power. There have been only 3 days that could be considered sunny at all (even for a few hours). To put this in some perspective for you, Wolfram Alpha tells me that 102 amp-hours is half of a typical car battery capacity. So I've really had to scrimp & save. I disconnected my router to save power, and I plumbed the depths of the battery bank (further than it's really safe to go). And I was completely out of power for several days.

So, I've bought a generator. I had to order it online, because it seems every store in the entire eastern US is completly out of stock, due to hurricane Sandy. This house has a generator shed (complete with a massive old generator that doesn't work), but I put the new, small generator in the furthest outbuilding, from where it can only barely be heard in the house. Later I'll be able to run a cable from its 12v terminal to charge the battery bank (if the sun doesn't get there first); for now it only powers my laptop. Nasty noisy smelly dirty thing, it pains me to run it.

Back to the "overcast" forecast. I've heard meterologists say that one unexpected consequence of climate change seems to be that weather patterns are persisting longer, giving us this summer's weeks of neverending heat, and large, slow-moving storms. I suspect this could also explain the blankets of clouds that have been settling in for weeks at a time, on and off since mid-November. But I only have 2.5 years of not very good data. Before, I paid about as much attention to whether it was sunny as anyone else, which is to say, not really very much. Anyone know of a source of historical cloud cover data?

Syndicated 2013-01-03 17:25:08 from see shy jo

no longer a perl programmer

This year, I've gradually realized that I no longer identify as a perl programmer. For a decade and a half, perl was the language I reached for to solve any problem that didn't have a good reason to be solved in some other language. Now I only reach for it in the occasional one-liner -- and even then I'm more likely to find myself in ghci and end up with a small haskell program.

I still maintain plenty of perl code, but even when I do, I'm not thinking in perl, but traslating from some internal lambda calculus. There's quite a few new-ish perl features that I have not bothered to learn, and I can feel some of the trivia that perl encourages be kept in mind slipping gradually away. Although the evil gotchas remain fresh in my mind!

More importantly, my brain's own evaluation of code has changed; it doesn't evaluate it imperatively (unless forced to by an appropriate monad), but sees the gesalt, sees the data flow, and operates lazily and sometimes, I think in parallel. The closest I can come to explaining the feeling is how you might feel when thinking about a shell pipeline, rather than a for loop.

Revisiting some of my older haskell code, I could see the perl thinking that led to it. And rewriting it into pure, type-driven, code that took advantage of laziness for automatic memoization, I saw, conclusively that the way I think about code has changed. (See the difference for yourself: before after )

I hear of many people who enjoy learning lots of programming languages, one after the other. A new one every month, or year. I suspect this is a fairly shallow learning. I like to dive deep. It took me probably 6 years to fully explore every depth of perl. And I never saw a reason to do the same with python or ruby or their ilk; they're too similar to perl for it to seem worth the bother. Though they have less arcania in their learning curves and are probably better, there's not enough value to redo that process. I'm glad haskell came along as a language that is significantly different enough that it was worth learning. The deep dive for haskell goes deep indeed. I'm already 5 years in, and have more to learn now than I ever did before.

I'm glad I didn't get stuck on perl. But I may be stuck on haskell now instead, for the foreseeable future. I'd sort of like to get really fluent in javascript, but only as a means to an end -- and haskell to javascript compilers are getting sufficiently good that I may avoid it. Other than that, I sense adga and coq beckoning with their promises of proof. Perhaps one of these years.

Of course if Bradley Khun is right and perl is the new cobol, I know what I'll be doing come the unix rollover in 2038. ;)

Syndicated 2012-12-31 17:08:24 from see shy jo

I haven't used my yurt much at all lately. It was an excellent getaway when I needed to get away; these days I don't. Or, perhaps, some would say, I've gotten away...

blue circle of sky seen
through the center of the yurt

So yesterday, since I happened to be over there, and after we'd played board games to our hearts content, we decided to take it down. This had been on the todo list since last fall, so it was not entirely spur of the moment. Still, the days are short this near to solstice and this one only had a few hours of light left.

skinning the yurt

It'd been up for well over 4 years, in a wet environment. The roof is pretty well shot and will need to be replaced, but the canvas walls are still in good shape. There was a surprise; it seems a mouse or something started nibbling on all the wood of the walls recently, and even the center ring is pitted and worn by rodent teeth. It all seems still structurally sound. And looking back at my setup pictures, I'm not bothered by the yurt having aged, it just makes it seem more real.

We left the bones of the yurt standing on its hill, seeming very appropriate under the bare winter trees. Will return soon and finish taking it down. And perhaps in some years I'll find a need for it again.

(Pictures by my sister, who will be annoyed at me linking to this not-yet-formally published post about the yurt from her POV: Decline and fall of the yurt)

Syndicated 2012-12-16 20:00:51 from see shy jo

hledger

Apologies in advance for writing a long blog post about the dull and specialised subject of double-entry accounting from the Unix tools perspective, that ends up talking about Monads to boot. I can't believe I'm about to write such a thing, but I'm finding it an oddly interesting subject.

double-entry accounting

I've known of, though probably not really understood double entry accounting for a while, thanks to GnuCash. I think GnuCash did something rather special in making such a subject approachable to the layman, and I've been happy to recommend GnuCash to friends. I was stoked to find a chapter in my sister Anna's new book that happily and plausibly suggests readers use GnuCash.

But for my personal use, GnuCash is clunky. I mean, I'm a programmer, but I can't bring any of my abilities to bear on it, without a deep dive into the code. The file format is opaque (and a real pain to keep checked into git with all the log files); the user interface is often confusing, but there's no benefit to its learning curve, it never seems to get better than manually entering data into its ledger, or perhaps importing data from a bank. I've never found the reports very useful.

I've got perhaps a year of data in GnuCash, but it's fragmented and incomplete and not something I've been able to motivate myself to keep up with. So I have less financial data than I'd like. I'm hoping ledger will change that.

ledger

I've known about ledger for a while, at least since This Linux Weekly News article. It's a quintessential Unix tool, that simply processes text files. The genius of it is the simplicity of the file format, that gets the essence and full power of double entry bookeeping down to something that approaches a meme. Once you get the file format stuck in your head, you're done for.

  2004/05/27 Book Store
      Expenses:Books                 $20.00
      Liabilities:Visa

starting to use hledger

Now as a Haskell guy, I was immediately drawn to the Haskell clone, hledger. It's nice that there are two (mostly) compatable implementations of ledger too. So from here on, I'll be talking about hledger.

I sat down and built a hledger setup for myself the other day. I started by converting the GnuCash books I have been keeping up-to-date, for a small side business (a rental property). It quickly turned into something like programming, in the best way, as I used features like:

  • Include directives, so I can keep my business data in its own file, while pulling it into my main one.
  • Simple refactorings, like putting "Y 2012" at the top, so I don't have to write the year in each transaction.
  • Account aliases, so I can just type "rent", rather than "income:rental" and "repairs:contractor" rather than "expenses:home repair:contractor"
  • All the power of my favorite editor.

a modern unix program

While I've been diving into hledger, I've been drawing all kinds of parallels between it and other modern Unix-friendly programs I use lately. I think we've gone over a bit of a watershed recently. Unix tools used to be either very simple and crude (though often quite useful), or really baroque and complex with far too many options (often due to 10-30 years of evolution). Or they were a graphical user interface, like GnuCash, and completely divorced from Unix traditions.

The new unix programs have some commonalities...

  • They're a single command, with subcommands. This keeps the complexity of doing any one thing down, and provides many opportunities for traditional unix tools philosophy, without locking the entire program into being a single-purpose unix tool.

    hledger's subcommands range from querying and reports, to pipable print, to a interactive interface.

  • They have a simple but powerful idea at their core, that can be expressed with a phrase like "double entry accounting is a simple file format" (ledger), or "files, trees, commits" (git).

    Building a tool on a central idea is something I strive to do myself. So the way ledger manages it is particularly interesting to me.

  • They are not afraid to target users who have a rather special kind of literacy (probably the literacy you need to have gotten to here in this post). And reward them with a lot of power.

    Ledger avoids a lot of the often confusing terminology around accounting, and assumes a mind primed with the Unix tools philosophy.

  • If there's a GUI, it's probably web based. There's little trust in traditional toolkits having staying power, and the web is where the momentum is. The GUI is not the main focus, but does offer special features of its own.

    hledger's web UI completely parallels what I've doing with the git-annex webapp, right down to being implemented using Yesod -- which really needs to be improved to use some methods I've developed to make it easier to make webapps that integrate with the desktop and are more secure, if there are going to be a lot more programs like this using it.

importing data

After manually converting my GnuCash data, I imported all my PayPal history into hledger. And happily it calculates the same balance Paypal does. It also tells me I've paid PayPal $180 in transaction fees over the years, which is something PayPal certianly doesn't tell you on their website. (However, my current import from PayPal's CSV files is a hackish, and only handles USD currency, so I miss some currency conversion fees.)

I also imported my Amazon Payments history, which includes all the Kickstarter transactions. I almost got this to balance, but hledger and Amazon disagree about how many hundreths of a cent remain in my account. Still, pretty good, and I know how much I paid Amazon in fees for my kickstarter, and how much was kicked back to Kickstarter as well. (Look for a cost breakdown in some future blog post.)

At this point, hledger stats says I have 3700 transactions on file, which is not bad for what was just a few hours work yesterday.

One problem is hledger's gotten a little slow with this many transactions. It takes 5 seconds to get a balance. The ledger program, written in C, is reportedly much faster. hledger recently had a O(n^2) slowdown fixed, which makes me think it's probably only starting to be optimised. With Haskell code, you can get lucky and get near C (language, not speed of light) performace without doing much, or less lucky and get not much better than python performance until you dive into optimising. So there's hope.

harnessing haskell

If there's one place hledger misses out on being a state of the art modern Unix program, it's in the rules files that are used to drive CSV imports. I found these really hard to use; the manual notes that "rules file parse errors are not the greatest"; and it's just really inflexible. I think the state of the art would be to use a Domain Specific Language here.

For both my Amazon and PayPal imports I had CVS data something like:

  date, description, amount, fees, gross

I want to take the feeds into account, and make a split transaction, like this:

  date description
    assets:accounts:PayPal             $9.90
    expenses:transaction fees:PayPal   $0.10
    income:misc:PayPal                 -$10.00

This does not seem possible with the rules file. I also wanted to combine multiple CVS lines, to do with currency conversions, into a single transaction, and couldn't.

The problem is that the rules file is an ad-hoc format, not a fully programmable one. If instead, hledger's rules files were compiled into standalone haskell programs that performed the import, arbitrarily complicated conversions like the above could be done.

So, I'm thinking about something like this for a DSL.. I doubt I'll get much further than this, since I have a hacked together multi-pass importer that meets my basic needs. Still, this would be nice, and being able to think about adding thing kind of thing to a program cleanly is one of the reasons I reach for the Haskell version when possible these days.

First, here's part of one of my two paypal import rules files (the other one extracts the transaction fees):

  amount-field 7
date-field 0
description-field %(3) - %(4) - %(11)
base-account assets:accounts:PayPal

Bank Account
assets:accounts:checking

.*
expenses:misc:PayPal

That fills out the basic fields, and makes things with "Bank Account" in their description be categorised as bank transfers.

Here's how it'd look as Haskell, carefully written to avoid the $ operator that's more than a little confusing in this context. :)

main :: IO ()
main = convert paypalConveter

paypalConverter :: [CSVLine] -> [Maybe Transaction]
paypalConverter = map convert
  where
    convert = do
        setAmount =<< field 7
        setDate =<< field 0
        setDescription =<< combine " - " [field 3, field 4, field 11]
        defaultAccounts
            "assets:accounts:PayPal" ==> "expenses:misc:PayPal"
        inDescription "Bank Account" ?
            "assets:accounts:PayPal" ==> "assets:accounts:checking"

That seems like something non-haskell people could get their heads around, especially if they didn't need to write the boilerplate function definitions and types at the top. But I may be biased. :)

Internally, this seems to be using a combination Reader and Writer monad that can get at fields from a CSV line and build up a Transaction. But I really just made up a simple DSL as I went along and thew in enough syntax to make it seem practical to implement. :)

Of course, a Haskell programmer can skip the monads entirely, or use others they prefer. And could do arbitrarily complicated stuff during imports, including building split transactions, and combining together multiple related CVS lines into a single transaction.

Syndicated 2012-12-03 18:04:03 from see shy jo

506 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!