20 Sep 2011 joey   » (Master)

happy haskell hacker

There are certian things haskell is very good at, and I had the pleasure of using it for one such thing yesterday. I wanted to support expressions like find(1) does, in git-annex. Something like:

  git-annex drop --not --exclude '*.mp3' --and \
    -\( --in usbdrive --or --in archive -\) --and \
    --not --copies 3

So, parens and booleans and some kind of domain-specific operations. It's easy to build a data structure in haskell that can contain this sort of expression.

{- A Token can either be a single word, or an Operation of an arbitrary type. -}
data Token op = Token String | Operation op
        deriving (Show, Eq)

data Matcher op = Any
        | And (Matcher op) (Matcher op)
        | Or (Matcher op) (Matcher op)
        | Not (Matcher op)
        | Op op
        deriving (Show, Eq)

(The op could just be a String, but is parameterized for reasons we'll see later.)

As command-line options, the expression is already tokenised, so all I needed to do to parse it is consume a list of the Tokens. The only mildly tricky thing is handling the parens right -- I chose to not make it worry if there were too many, or too few closing parens.

generate :: [Token op] -> Matcher op
generate ts = generate' Any ts
generate' :: Matcher op -> [Token op] -> Matcher op
generate' m [] = m
generate' m ts = uncurry generate' $ consume m ts

consume :: Matcher op -> [Token op] -> (Matcher op, [Token op])
consume m [] = (m, [])
consume m ((Operation o):ts) = (m `And` Op o, ts)
consume m ((Token t):ts)
        | t == "and" = cont $ m `And` next
        | t == "or" = cont $ m `Or` next
        | t == "not" = cont $ m `And` (Not next)
        | t == "(" = let (n, r) = consume next rest in (m `And` n, r)
        | t == ")" = (m, ts)
        | otherwise = error $ "unknown token " ++ t
        where
                (next, rest) = consume Any ts
                cont v = (v, rest)

Once a Matcher is built, it can be used to check if things match the expression the user supplied. This next bit of code almost writes itself.

{- Checks if a Matcher matches, using a supplied function to check
 - the value of Operations. -}
match :: (op -> v -> Bool) -> Matcher op -> v -> Bool
match a m v = go m
        where
                go Any = True
                go (And m1 m2) = go m1 && go m2
                go (Or m1 m2) = go m1 || go m2
                go (Not m1) = not (go m1)
                go (Op o) = a o v

And that's it! This is all nearly completly generic and could be used for a great many things that need support for this sort of expression, as long as they can be checked in pure code.

A trivial example:

  *Utility.Matcher> let m = generate [Operation True, Token "and", Token "(", Operation False, Token "or", Token, "not", Operation False, Token ")"]
*Utility.Matcher> match (const . id) m undefined 
True

For my case though, I needed to run some IO actions to check if expressions about files were true. This is where I was very pleased to see a monadic version of match could easily be built.

{- Runs a monadic Matcher, where Operations are actions in the monad. -}
matchM :: Monad m => Matcher (v -> m Bool) -> v -> m Bool
matchM m v = go m
        where
                go Any = return True
                go (And m1 m2) = liftM2 (&&) (go m1) (go m2)
                go (Or m1 m2) =  liftM2 (||) (go m1) (go m2)
                go (Not m1) = liftM not (go m1)
                go (Op o) = o v

With this and about 100 lines of code to implement specific tests like --copies and --in, git-annex now supports the example at the top.

Just for comparison, find(1) has thousands of lines of C code to build a similar parse tree from the command line parameters and run it. Although I was surprised to see that it optimises expressions by eg, reordering cheaper tests first.

The last time I wrote this kind of thing was in perl, and there the natural way was to carefully translate the expression into perl code, which was then evaled. Meaning the code was susceptible to security holes.

Anyway, this is nothing that has not been done a hundred times in haskell before, but it's very nice that it makes it so clean, easy, and generic.

Syndicated 2011-09-19 23:44:31 from see shy jo

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!