Older blog entries for joey (starting at number 506)

hledger

Apologies in advance for writing a long blog post about the dull and specialised subject of double-entry accounting from the Unix tools perspective, that ends up talking about Monads to boot. I can't believe I'm about to write such a thing, but I'm finding it an oddly interesting subject.

double-entry accounting

I've known of, though probably not really understood double entry accounting for a while, thanks to GnuCash. I think GnuCash did something rather special in making such a subject approachable to the layman, and I've been happy to recommend GnuCash to friends. I was stoked to find a chapter in my sister Anna's new book that happily and plausibly suggests readers use GnuCash.

But for my personal use, GnuCash is clunky. I mean, I'm a programmer, but I can't bring any of my abilities to bear on it, without a deep dive into the code. The file format is opaque (and a real pain to keep checked into git with all the log files); the user interface is often confusing, but there's no benefit to its learning curve, it never seems to get better than manually entering data into its ledger, or perhaps importing data from a bank. I've never found the reports very useful.

I've got perhaps a year of data in GnuCash, but it's fragmented and incomplete and not something I've been able to motivate myself to keep up with. So I have less financial data than I'd like. I'm hoping ledger will change that.

ledger

I've known about ledger for a while, at least since This Linux Weekly News article. It's a quintessential Unix tool, that simply processes text files. The genius of it is the simplicity of the file format, that gets the essence and full power of double entry bookeeping down to something that approaches a meme. Once you get the file format stuck in your head, you're done for.

  2004/05/27 Book Store
      Expenses:Books                 $20.00
      Liabilities:Visa

starting to use hledger

Now as a Haskell guy, I was immediately drawn to the Haskell clone, hledger. It's nice that there are two (mostly) compatable implementations of ledger too. So from here on, I'll be talking about hledger.

I sat down and built a hledger setup for myself the other day. I started by converting the GnuCash books I have been keeping up-to-date, for a small side business (a rental property). It quickly turned into something like programming, in the best way, as I used features like:

  • Include directives, so I can keep my business data in its own file, while pulling it into my main one.
  • Simple refactorings, like putting "Y 2012" at the top, so I don't have to write the year in each transaction.
  • Account aliases, so I can just type "rent", rather than "income:rental" and "repairs:contractor" rather than "expenses:home repair:contractor"
  • All the power of my favorite editor.

a modern unix program

While I've been diving into hledger, I've been drawing all kinds of parallels between it and other modern Unix-friendly programs I use lately. I think we've gone over a bit of a watershed recently. Unix tools used to be either very simple and crude (though often quite useful), or really baroque and complex with far too many options (often due to 10-30 years of evolution). Or they were a graphical user interface, like GnuCash, and completely divorced from Unix traditions.

The new unix programs have some commonalities...

  • They're a single command, with subcommands. This keeps the complexity of doing any one thing down, and provides many opportunities for traditional unix tools philosophy, without locking the entire program into being a single-purpose unix tool.

    hledger's subcommands range from querying and reports, to pipable print, to a interactive interface.

  • They have a simple but powerful idea at their core, that can be expressed with a phrase like "double entry accounting is a simple file format" (ledger), or "files, trees, commits" (git).

    Building a tool on a central idea is something I strive to do myself. So the way ledger manages it is particularly interesting to me.

  • They are not afraid to target users who have a rather special kind of literacy (probably the literacy you need to have gotten to here in this post). And reward them with a lot of power.

    Ledger avoids a lot of the often confusing terminology around accounting, and assumes a mind primed with the Unix tools philosophy.

  • If there's a GUI, it's probably web based. There's little trust in traditional toolkits having staying power, and the web is where the momentum is. The GUI is not the main focus, but does offer special features of its own.

    hledger's web UI completely parallels what I've doing with the git-annex webapp, right down to being implemented using Yesod -- which really needs to be improved to use some methods I've developed to make it easier to make webapps that integrate with the desktop and are more secure, if there are going to be a lot more programs like this using it.

importing data

After manually converting my GnuCash data, I imported all my PayPal history into hledger. And happily it calculates the same balance Paypal does. It also tells me I've paid PayPal $180 in transaction fees over the years, which is something PayPal certianly doesn't tell you on their website. (However, my current import from PayPal's CSV files is a hackish, and only handles USD currency, so I miss some currency conversion fees.)

I also imported my Amazon Payments history, which includes all the Kickstarter transactions. I almost got this to balance, but hledger and Amazon disagree about how many hundreths of a cent remain in my account. Still, pretty good, and I know how much I paid Amazon in fees for my kickstarter, and how much was kicked back to Kickstarter as well. (Look for a cost breakdown in some future blog post.)

At this point, hledger stats says I have 3700 transactions on file, which is not bad for what was just a few hours work yesterday.

One problem is hledger's gotten a little slow with this many transactions. It takes 5 seconds to get a balance. The ledger program, written in C, is reportedly much faster. hledger recently had a O(n^2) slowdown fixed, which makes me think it's probably only starting to be optimised. With Haskell code, you can get lucky and get near C (language, not speed of light) performace without doing much, or less lucky and get not much better than python performance until you dive into optimising. So there's hope.

harnessing haskell

If there's one place hledger misses out on being a state of the art modern Unix program, it's in the rules files that are used to drive CSV imports. I found these really hard to use; the manual notes that "rules file parse errors are not the greatest"; and it's just really inflexible. I think the state of the art would be to use a Domain Specific Language here.

For both my Amazon and PayPal imports I had CVS data something like:

  date, description, amount, fees, gross

I want to take the feeds into account, and make a split transaction, like this:

  date description
    assets:accounts:PayPal             $9.90
    expenses:transaction fees:PayPal   $0.10
    income:misc:PayPal                 -$10.00

This does not seem possible with the rules file. I also wanted to combine multiple CVS lines, to do with currency conversions, into a single transaction, and couldn't.

The problem is that the rules file is an ad-hoc format, not a fully programmable one. If instead, hledger's rules files were compiled into standalone haskell programs that performed the import, arbitrarily complicated conversions like the above could be done.

So, I'm thinking about something like this for a DSL.. I doubt I'll get much further than this, since I have a hacked together multi-pass importer that meets my basic needs. Still, this would be nice, and being able to think about adding thing kind of thing to a program cleanly is one of the reasons I reach for the Haskell version when possible these days.

First, here's part of one of my two paypal import rules files (the other one extracts the transaction fees):

  amount-field 7
date-field 0
description-field %(3) - %(4) - %(11)
base-account assets:accounts:PayPal

Bank Account
assets:accounts:checking

.*
expenses:misc:PayPal

That fills out the basic fields, and makes things with "Bank Account" in their description be categorised as bank transfers.

Here's how it'd look as Haskell, carefully written to avoid the $ operator that's more than a little confusing in this context. :)

main :: IO ()
main = convert paypalConveter

paypalConverter :: [CSVLine] -> [Maybe Transaction]
paypalConverter = map convert
  where
    convert = do
        setAmount =<< field 7
        setDate =<< field 0
        setDescription =<< combine " - " [field 3, field 4, field 11]
        defaultAccounts
            "assets:accounts:PayPal" ==> "expenses:misc:PayPal"
        inDescription "Bank Account" ?
            "assets:accounts:PayPal" ==> "assets:accounts:checking"

That seems like something non-haskell people could get their heads around, especially if they didn't need to write the boilerplate function definitions and types at the top. But I may be biased. :)

Internally, this seems to be using a combination Reader and Writer monad that can get at fields from a CSV line and build up a Transaction. But I really just made up a simple DSL as I went along and thew in enough syntax to make it seem practical to implement. :)

Of course, a Haskell programmer can skip the monads entirely, or use others they prefer. And could do arbitrarily complicated stuff during imports, including building split transactions, and combining together multiple related CVS lines into a single transaction.

Syndicated 2012-12-03 18:04:03 from see shy jo

git push over XMPP

Imagine if you could send git pushes to any of your friends on Google Talk or other Jabber (XMPP) servers. Even though you're in different places and your computers probably cannot talk to one-another directly, you can share a git repository, without relying on a git hosting provider such as GitHub.

  To xmpp::joey@kitenet.net
  * [new branch]      master -> xmpp/master

That's what I've built over the past week. So far, it's only supported by the git-annex assistant, but it would not be very hard to make this into a separate git-remote-xmpp program, and a push listening daemon, that could be used with non-git-annex repisitories too.

You can read about the details in my git-annex assistant dev blog:

  1. day 124 git push over xmpp groundwork
  2. day 125 xmpp push continues
  3. day 126 mr watson come here

And yes, I update that blog every single work day. If you're not getting enough blog updates from me, and are not scared by rather technical posts, you might want to subscribe to it. :)

Syndicated 2012-11-09 22:42:19 from see shy jo

UPS farce

This is not the kind of place I live, though I don't mind a visit. Errol looking out over Salsalito and the Bay I'm not bothered by a more rural life, I thrive on peace and quiet, and am not bothered by needing to haul my water, split firewood, or spend an hour just mowing my driveway.

I bought a GPS the other day, and it doesn't know where my driveway is..
or where the road leading to my driveway is..
or even about the highway leading to that road!


not many roads on this map..

The only thing that really bothers me about living here off the map is that I'm in the middle of a food desert too. The nearest semi-decent grocery store and produce market are 40 minutes away. There's one restaurant worth eating at, and while a very congenial place to eat rustic food while watching the river flow by, it serves almost no vegetables. Adding to the problem, my fridge is a rather expensive propane model, which is about half a normal size, and has no freezer. Managing a healthy meal plan is a constant struggle, though I do have time to cook, at least, thanks to not having the kind of commute that is typical around here.

Well, the food desert was my only real problem. Now I can add to that the UPS emergency. A week ago, a package left Shenzen China for here. It quickly passed through Anchorage Alaska, and with a few days had arrived an hour and a half from here. Getting it here has taken the whole rest of the week.


I only knew UPS attempted delivery because a neighbor told me. Seems they took one look at my driveway and turned around. They never tried again, and they didn't bother to leave a note on the mailbox, or anything. Now, the US postal service is happy to drive up this driveway; they drive a sensible 4x4 around here. I think DHL and Fedex have been here too. (Contrary to John Goerzen's experiences) My compact car is not bothered by my driveway either...

But UPS's brand involves large brown trucks, even where inappropriate, and damage to them from such things as low-hanging branches apparently comes out of the driver's paycheck. And while normally UPS leaves small things in my mailbox, or with a neighbor, this was an important package from China, and it needed a signature. But really, a post card? "emergency"?!

So, I bought the GPS to find UPS, because my package was in an industrial park in some town I've never been to. And on Tuesday I was within an hour of there and swung by. And it was closed. UPS's website had not mentioned this; other businesses are open the day after Memorial day. That turned out to be a really annoying day all around. When I got home, after a night away, I found my fridge's fire had blown out, and nearly all my food was only fit for the dog.

Today I went out to try again (and buy groceries). When I reached the UPS depot.. it was closed! Until 3 pm. At this point I felt this story was reaching full-on farce.

After cooling my heels in a library for three hours, I was finally able to open the most over-packed package I've probably ever seen.

My kickstarter reward sample looks good.. I'll be ordering a lot more, but not to this address.

As a coda to the UPS farce, arriving home today I checked my mailbox, and found the UPS postcard had finally arrived. It included mention of the 3 pm opening time, and noted that I had until tomorrow before they'd return my package to China.

The coda to the coda is that on my way home from this second annoying day delivered courtesy of UPS, I stumbled on a restaurant overlooking a waterfall, just outside the town where I get my groceries. It serves food that satisfies both my Cajun and healthy cravings. Now I just need a larger fridge..

Syndicated 2012-09-06 23:56:11 from see shy jo

California postcard 2012

San Francisco skyline, Errol & Dani, Dani on cliffs at Mendocino, Sango on Hendy woods redwood

Syndicated 2012-08-16 16:25:25 from see shy jo

build a dynamic webapp in Yesod in just three days

This screenscast shows what I built. Scroll down for my Yesod braindump.

I've been astonished how quickly this went together. This is my first time using any sort of web framework, and I used a still unusual one, Yesod. It's my first time using Bootstrap too. It's also the first time I've done any AJAX programming!

Bootstrap was something I'd heard of and seen some suspiciously similar looking sites built with, but it was really a pleasant surprise. Being able to make things that work and look good on the web without struggling with CSS is such a nice change. For the first time, it makes the web feel like a UI toolkit to me.

As I've learned Yesod, I kept seeing problems the web framework solves that I either solved in an ad-hoc basis in Ikiwiki, or didn't realize were problems. In the former group are things like having a consistent way to store a URL to redirect to after an action (such as logging in) finishes, and generating urls programmatically in ways that avoid broken likes. In the latter group are things like exploiting types to guarantee mistakes can't be made. And also the abstraction of widgets, which combine html, css, and javascript, in a way that can be composed with other widgets.

Overall, I'm really enjoying Yesod, and it's making me productive in new ways.

I also see a lot of potential in Yesod to improve from where it is.

  • Probably the biggest improvement would be generating javascript from haskell code, so the type safety extends to the javascript side.

    There have been many tries at doing this, the first one I'm really tempted to use is Fay. Rather than build a heavy haskell runtime on top of javascript, it essentially uses javascript as its runtime, and offloads type checking to GHC. This lets it generate javascript code that is short and often comprehensible enough to give insights into what the original haskell code does.

    I'm betting this will be integrated into Yesod eventually. They have an active wiki page about it.

  • There's a WAI library for building local webapps with Yesod, but it was not suitable for my needs (for one thing, it lacks security; for another it kills the haskell program when the web page is closed); so I built my own webapp library. A problem with my current pace of development is that I'm building lots of reusable libraries, but I don't have the time to stabalize them and make them generally available. That one goes in the pile of 2k+ lines of such code.

  • Yesod needs a version of the Hamlet markup that can be edited by people who only understand html. That means it should allow closing tags, and tabs, and not have nasty whitespace gotchas. I think this would be easy to build, starting from Hamlet. It could be called "Hecate".. I don't have time right now.

  • The compile time error messages are often beyond atrocious. Seriously, I'm tempted to write a filter to filter out common patterns where there's one line about a syntax error in a Hamlet file sandwitched in between 150 lines of type error gobbly-gook and generated code.

  • Some really nice things could be done integrating Yesod with Bootstrap. Like the ability to build entire sites by just composing together Bootstrap components, with no need to write a single line of html or css. I'm very tempted to try to build this library.

    webpage = bootstrap Dynamic $ do setTitle "FooCorp" login <- getLogin navbar [FixedTop] $ do brand "FooCorp" link AboutR link BlogR nav [PullRight] $ link . maybe LoginR ProfileR login div [ContainerFluid] $ content login where content Nothing = heroUnit $ do para $ "We're the FooCorp for you." button "Register Today" [Primary, Large] SignUpR carousel [ amazingFeatures , aboutFooCorp , pricing ] content (Just user) = do para $ "Welcome back " ++ name user ++ "!" showProfile user

Syndicated 2012-07-29 20:00:27 from see shy jo

ghc threaded runtime gotchas

That's a seemingly simple question I started asking various people two weeks ago. I didn't get many useful answers, but now I have experience doing it myself, and so here's a blog post brain dump.

I have been trying to convert git-annex to use GHC's threaded runtime, for a variety of reasons. Naively adding the -threaded option resulted in a git-annex command that seemed to randomly freeze, but only sometimes (and, infuriatingly, never when I straced it), and a test suite that froze at a random point almost every time I ran it. Not a good result, and lacking any knowledge about gotchas with using the threaded runtime, I was at a loss for a long time (most of DebConf) about how to fix it.

I now know of at least three classes of problems that enabling the threaded runtime can turn up in programs that have been developed using the non-threaded runtime.

accessing a MVar after forkprocess can hang

MissingH has some code similar to this, which works ok with the non-threaded runtime:

  forkProcess $ do
    debugM "about to run some command"
    executeFile ...

In the above example, debugM accesses a MVar. Doing that after forkProcess can result in a MVar deadlock, as it tries to access a MVar value, that is, apparently, not accessible to the forked process. (Bug report with test case)

So, using System.Cmd.Utils from MissingH is asking for trouble. I switched all my code to the newer and, apparently, threaded runtime safe System.Process.

forkProcess is a massively bad idea

Even when not accessing a MVar after forkProcess, it's very unsafe to use. It's easy to miss the warning attached to forkProcess, when the code seems to work. But with the threaded runtime, I've found that most any call to forkProcess will sometimes succeed, and sometimes freeze the whole program. This might only happen around 1 time in 1000. Then you'll find this warning and do a lot of head-scratching about what it really means:

forkProcess comes with a giant warning: since any other running threads are not copied into the child process, it's easy to go wrong: e.g. by accessing some shared resource that was held by another thread in the parent.

The hangs I saw could be due to laziness issues deferring code to run after the forkProcess that you'd expect to have run before it ... or who knows what else.

It's not clear to me that it's possible to use forkProcess safely in Haskell code. I think it's notable that System.Process runs the whole fork/exec in C code instead.

unsafe FFI calls block

According to most of the documentation you'll find in eg, the Haskell wiki, Real World Haskell, etc, the only difference between the safe and unsafe imports in the FFI is that unsafe is faster, and shouldn't be used for C code that calls back into Haskell code.

But the documentation is out of date. Actually, if you're using the FFI, and the foreign function can block, you need to use safe. When using unsafe, a blocking foreign function can block all threads of the program.

In my case, I was using kqueue to wait for changes to files, and this indeed blocked my whole program when linked with -threaded. Marking it safe fixed this.

The details are well described in this paper: http://community.haskell.org/~simonmar/papers/conc-ffi.pdf

Somewhat surprisingly, this blocking only happens when using the threaded runtime. If you're using the non-threaded runtime with unsafe blocking FFI functions, your other pseudo-threads won't be blocked. This is because the non-threaded runtime has an SIGALARM timer that interrupts (most) blocking system calls. This leads to other troubles of its own (like needing to restart interrupted FFI functions, or blocking the other pseudo-threads from running if the C code ignores SIGALARM), but that's offtopic for this post.

summary

Converting a large Haskell code base from the default, non-threaded runtime to the threaded runtime can be quite tricky. None of the problems are the sort of thing that Haskell helps to manage either. I will be developing new programs using the threaded runtime from the beginning from now on.

By the way, don't take this post to say that threading in Haskell sucks. I've really been enjoying writing threaded Haskell code. The control Haskell gives over isolation of state to threads, and the excellent and wide-ranging suite of thread communications data types (MVar, Chan, QSemN, TMVar, SampleVar, etc) have made developing a complex threaded program something I feel comfortable doing for the first time, in any language.

Syndicated 2012-07-20 19:31:37 from see shy jo

I am become Joey, destroyer of drives

drive #1

To add to the fun of being in Nicaragua, my laptop's solid state drive died the second day here. Seems this is the failure mode for a SSD, get a little slow, and then a switch flips and it stops accepting any writes, at all, becoming a read-only media. Never was there an indication of a problem in SMART etc.

Did you know that Nicaragua has neither names for most roads, nor addresses? Rather than trying to shoehorn the directions to my hotel into the address fields of Amazon.com, I shipped the replacement SSD to home, and thought I'd limp along.

drive #2

I destroyed my second drive one hour after getting it working. A borrowed USB flash drive, which seems to have melted under the unusual load while I was away at lunch. Perhaps putting an ext3 filesystem on it was a very bad idea, although I have successfully run other USB flash drives that way for years. Perhaps it was a cheap drive that only pretended to hold 32 gb.

drive #3

For several days I happily used the third drive, a USB hard drive, as my temporary root filesystem. Until I destroyed it by knocking it off my bed.

drive #4, #5

Now my netbook is running from a Debian Live USB key, with a second USB key for my customisations. So far I have not managed to destroy these, but there's still a day left in my trip..

Syndicated 2012-07-13 22:43:36 from see shy jo

debian-cd work at DebCamp

I've been working the past two days on debian-cd, which was the main thing (besides git-annex assistant) I planned to work on at DebCamp.

Yesterday, Steve McIntyre and I cleaned up some cruft in debian-cd's package lists. This freed somewhere in the area of 30 mb. I also took a grep through d-i and made sure the CDs include packages that d-i can optionally install in some situations.

Today, I investigated how Recommends are ordered on the CD, and concluded it's as close to optimal as can be easily achieved. So was not able to save space there, but I did find a way to reorganize the desktop task that avoids needing to include a lot of printing stuff and some other stuff on the first CD. While that helped some, it still didn't get either Gnome or Kde to entirely fit, so getting there will probably involve rebuilding 100 or so packages with xz, if someone decides to do that.

So it's stil TBD whether Gnome or Kde will fit on a single CD in Debian wheezy. At this point I think most of us are getting tired of fighting this increasingly losing battle every release, and so other options like only having a desktop DVD are looking more appealing, as they solve the problem long-term. This would also free up the first CD for other interesting use cases, perhaps xfce, or perhaps a CD targeted at server users, and/or containing high-popularity non-desktop packages.

Syndicated 2012-07-09 02:52:58 from see shy jo

git-annex-assistant milestone

I reached a nice milestone on my git-annex assistant in my first day's work at DebCamp in Nicaragua. Here's a screencast demoing it.

git-annex-assistant.ogg (12 MB)

By the way, the weather, food, and people here are all excellent.

Syndicated 2012-07-05 22:40:10 from see shy jo

497 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!