Recent blog entries for johnw

Putting lenses to work

Putting lenses to work

I gave a talk a couple of weeks ago at BayHac 2017 on “Putting lenses to work”, to show in a practical context how we use lenses at my workplace. I specifically avoided any theory about lenses, or the complex types, or the many operators, to show that at its core, lens is a truly invaluable library:

The videos are now available on YouTube, and the slides for this talk are on GitHub.

The code in the slides are taken directly (using Emacs) from a test file in that same repository, Lenses.hs, to serve as a way of preserving helpful examples, and to make it easy to cargo cult specific patterns into your code.

Syndicated 2017-04-22 00:00:00 from Lost in Technopolis

Submitting Haskell functions to Z3

Submitting Haskell functions to Z3

Conal Elliott has been working for several years now on using categories, specifically cartesian closed category, as a way to abstract Haskell functions at compile-time, so you can render the resulting “categorical term” into other categories.

Here’s an example Haskell function:

\x -> f x (g x)

And here’s its categorical rendering, just to give the flavor of the idea:

eval ∘ (f' △ g')

Where eval means uncurry ($), and f' and g' are the renderings of those two functions; and the operator is (&&&). I’m not using the typical Haskell names for these, by the way, in order to convince myself not to “think in Haskell” when working with these terms, but rather I’m choosing whatever symbols I find most often using in the literature on catgeory theory.

There are a few things to notice about these categorical terms:

  1. They must be point-free. There is no such thing as naming a term, only morphisms that use or produce objects. Hence Awodey calls category theory “the algebra of functions”.

  2. They quickly become very large and unreadable. All but the simplest terms are nearly impossible to understand just by looking at them. Think of it as the binary code for categories.

  3. Because they are just, in effect, chains of composition, without any name binding or scoping issue to consider, the nature of the computation is laid out in a very direct (albeit verbose) way, making rewrite rules available throughout the abstract term.

Although it seems a bit technical at first, the idea is quite simple: Discern the abstract, categorical meaning of a Haskell function, then realize that term in any other category that is cartesian (has products) and closed (has functions as objects, i.e., higher-order constructions). Nothing else needs to be known about the target category for the abstract term to have meaning there. That’s the beauty of using category theory as a universal language for expressing ideas: the meaning transports everywhere.

Here’s an equation meant for the solver, written in plain Haskell:

equation :: (Num a, Ord a) => a -> a -> Bool
equation x y =
    x < y &&
    y < 100 &&
    0 <= x - 3 + 7 * y &&
    (x == y || y + 20 == x + 30)

Here’s how I run the solver, using z3cat, which is built on top of Conal’s concat library:

mres <- liftIO $ runZ3 (ccc (uncurry (equation @Int))) $ do
    x <- mkFreshIntVar "x"
    y <- mkFreshIntVar "y"
    return $ PairE (PrimE x) (PrimE y)
case mres of
    Nothing  -> error "No solution found."
    Just sol -> putStrLn $ "Solution: " ++ show sol

And the result, also showing the equation submitted to Z3:

(let ((a!1 (ite (<= 0 (+ (- x!0 3) (* 7 y!1)))
                (ite (= x!0 y!1) true (= (+ y!1 20) (+ x!0 30)))
                false)))
  (ite (< x!0 y!1) (ite (< y!1 100) a!1 false) false))
Solution: [-8,2]

Now with one function, I have either a predicate function I can use in Haskell, or an input for Z3 to find arguments for which it is true!

In addition to using Conal’s work in Haskell, I’m also working on a Coq rendering of his idea, which I hope will give me a more principled way to extract Coq programs into Haskell, by way of their categorical representation.

Syndicated 2017-04-18 00:00:00 from Lost in Technopolis

A case of reflection

A case of reflection

A while back, Edward Kmett wrote a library called reflection, based on a 2004 paper by Oleg Kiselyov and Chung-chieh Shan that describes a neat trick for reifying data into types (here the word “reify” can be understood as turning a value into something that can be referenced at the type level). There was also an article written by Austin Seipp on how to use the library, and some great answers on reddit and stackoverflow that go into detail about how it works.

And yet, in all these years, though I’ve been on the lookout for a way to make use of this library, I wasn’t able to fit it into my workflow – until today! So let’s look at my real world use for reflection, which solves a problem that maybe others have encountered as well.

As you may know, the QuickCheck library provides a facility for generating arbitrary data sets. The property testing features of QuickCheck make use of this generation to search for test data that might violate a set of properties.

However, the generation facility can also be used on its own, separate from the testing components, to randomly generate data for any purpose. The library for producing this random data offers lots of combinators, and is based around instances for a type class called Arbitrary. Here’s a basic example:

module Main where

import Test.QuickCheck.Arbitrary
import Test.QuickCheck.Gen

data Foo = Foo [Int] [String]
    deriving Show

instance Arbitrary Foo where
    arbitrary = do
        xs  <- listOf chooseAny
        len <- choose (1, 100)
        ys  <- vectorOf len (shuffle "Hello, world")
        return $ Foo xs ys

main :: IO ()
main = print =<< generate (arbitrary :: Gen Foo)

This creates a specifically shaped set of random data, where the list of integers may be of any length, and any value, but the list of strings will always be from 1 to 100 elements long, and the strings will only consist of random arrangements of the characters found in "Hello, world".

Now, what if you wanted to guide the generation process for Foo using external information? Such as picking the length of the list of strings from a value provided by the user? Since Arbitrary does not allow the use of Reader, how do we get that user-supplied value into the arbitrary function above? And without using global IORefs or unsafePerformIO?

The reflection library allows us to reify a runtime value into a type (whose name we’ll never know, requiring us to reference it through a type variable), and then communicate that type via a constraint, such that we can reflect the value back out as needed. If this sounds a bit confusing, maybe an example can make it clearer:

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE UndecidableInstances #-}

module Main where

import Data.Proxy
import Data.Reflection
import Test.QuickCheck.Arbitrary
import Test.QuickCheck.Gen
import System.Environment

data Foo s = Foo [Int] [String]
    deriving Show

instance Reifies s Int => Arbitrary (Foo s) where
    arbitrary = do
        xs  <- listOf chooseAny
        len <- choose (1, reflect (Proxy :: Proxy s))
        ys  <- vectorOf len (shuffle "Hello, world")
        return $ Foo xs ys

main :: IO ()
main = do
    [len] <- getArgs
    reify (read len :: Int) $ \(Proxy :: Proxy s) ->
        print =<< generate (arbitrary :: Gen (Foo s))

There are a few additional things to note here:

  1. A phantom type variable has been added to Foo. This type variable associates the reified data to our type, so it can be reflected back out in the instance for this type.

  2. The Arbitrary instance for Foo s has incurred a new contraint, stating that the type represented by s somehow reifies an Int. How this happens is the magic of the reflection library, and uses a clever GHC trick representing Edward’s unique twist on Oleg and Chung-chieh’s work. This instance requires the UndecidableInstances extension.

  3. We now call reify with the data we want to pass along. This function takes a lambda whose first argument is a Proxy s, giving us a way to know which type variable to use in the type of the call to arbitrary. This requires the ScopedTypeVariables extension.

That’s it: reflection gives us a way to plumb extra data into instances at runtime, at the cost of adding a single phantom type.

If the phantom type seems excessive for one use case, or if adding the phantom would effect a large family of types, then an alternative is to enable the FlexibleInstances extension, and use Edward’s tagged library to carry the phantom instead:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE UndecidableInstances #-}

module Main where

import Data.Proxy
import Data.Tagged
import Data.Reflection
import Test.QuickCheck.Arbitrary
import Test.QuickCheck.Gen
import System.Environment

data Foo = Foo [Int] [String]
    deriving Show

instance Reifies s Int => Arbitrary (Tagged s Foo) where
    arbitrary = fmap Tagged $ do
        xs  <- listOf chooseAny
        len <- choose (1, reflect (Proxy :: Proxy s))
        ys  <- vectorOf len (shuffle "Hello, world")
        return $ Foo xs ys

main :: IO ()
main = do
    [len] <- getArgs
    reify (read len :: Int) $ \(Proxy :: Proxy s) ->
        print . unTagged =<< generate (arbitrary :: Gen (Tagged s Foo))

This way we leave the original type alone – which may be the only option if you’re generating arbitrary data for types from libraries. You’ll just have to wrap and unwrap the Tagged newtype wrapper as necessary.

Another benefit of using Tagged is that, because it can be wrapped and unwrapped as necessary, it becomes possible to change the refied information in cases where nested types are involved. In this last example, the user is allowed to specify the value that should be supplied to the Bar constructor during data generation.

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE UndecidableInstances #-}

module Main where

import Data.Proxy
import Data.Tagged
import Data.Reflection
import Test.QuickCheck.Arbitrary
import Test.QuickCheck.Gen
import System.Environment

newtype Bar = Bar Int
    deriving Show

data Foo = Foo [Bar] [String]
    deriving Show

instance Reifies s Int => Arbitrary (Tagged s Bar) where
    arbitrary = return $ Tagged $ Bar $ reflect (Proxy :: Proxy s)

instance Reifies s (Int, Int) => Arbitrary (Tagged s Foo) where
    arbitrary = fmap Tagged $ do
        let (len, bar) = reflect (Proxy :: Proxy s)
        xs <- listOf (reify bar $ \(Proxy :: Proxy r) ->
                          unTagged <$> (arbitrary :: Gen (Tagged r Bar)))
        l  <- choose (1, len)
        ys <- vectorOf l (shuffle "Hello, world")
        return $ Foo xs ys

main :: IO ()
main = do
    [len, barValue] <- getArgs
    reify (read len :: Int, read barValue :: Int) $ \(Proxy :: Proxy s) ->
        print . unTagged =<< generate (arbitrary :: Gen (Tagged s Foo))

Syndicated 2017-02-23 00:00:00 from Lost in Technopolis

Using the FMap library in Coq

Using the FMap library in Coq

One of the most common structures used in programming are key/value maps, also called hash maps, dictionaries, association lists, or simply functions. These maps generally provide a way to add new values, lookup keys, iterate over the collection, etc. Yet in Coq, even though this facility exists in the standard library under the module FMap, it can be quite difficult to get started with. This post intends to clarify the typical patterns in a way that is easy to copy into your own project, based on the four different ways this library is typically used.

Using a known key type and map structure

Very often, one maps from a known, ordered type, like nat, to some other type, using one of the concrete structures offered by the FMap library. In that case, the code you want to start with looks like this:

Require Import
  Coq.FSets.FMapList
  Coq.Structures.OrderedTypeEx.

Module Import M := FMapList.Make(Nat_as_OT).

You can now create a map using M.t A, where A is your value type. You can prefix the map-related functions with M., or just call them directly. Some common function to use on maps are as follows:

  • empty
  • add
  • remove
  • find
  • mem
  • is_empty
  • map
  • mapi
  • map2
  • fold

There are also several relations you can use to phrase theorems about maps and map membership:

  • In
  • MapsTo
  • Equal
  • Equiv
  • Equivb
  • Empty

Additional functions and lemmas

In order to complete most proofs concerning maps, there are additional lemmas and functions you’ll want to include:

Require Import
  Coq.FSets.FMapFacts.

Module P := WProperties_fun N_as_OT M.
Module F := P.F.

This provides two new prefixes, P. and F., which bring into scope many more helper functions and lemmas:

  • P.of_list
  • P.to_list
  • P.filter
  • P.for_all
  • P.exists_
  • P.partition
  • P.update
  • P.restrict
  • P.diff

Helper lemmas in the F module are generally best found using SearchAbout for the specific lemma you need. There are too many to list here, and they’re often quite specific in their use, such as F.find_mapsto_iff to reflect between the fact of a successful find operation, and its equivalent MapsTo relation.

Proofs involving maps

There are several induction principles you will need for completing inductive proofs over maps:

  • P.map_induction
  • P.map_induction_bis
  • P.fold_rec
  • P.fold_rec_bis
  • P.fold_rec_nodep
  • P.fold_rec_weak

The P.map_induction induction principle treats each intermediate map as an Add relation over a previous map, until it reaches the base Empty map. The _bis variant expresses the same information as successive calls to add down to an empty map.

P.fold_rec should be applied if the goal has the form of a call to M.fold over a map. If you use this, be sure to revert into the goal any hypotheses referring to the same map, since you’ll likely want to use those facts as part of the induction.

Note that these two sets of principles are used somewhat differently from each other:

-- Applies to any evidence in the context involving [m].
induction m using P.map_induction bis.

-- Applies only to evidence in the goal, thus sometimes
-- requiring use of [revert].
apply P.fold_rec.

Rewriting with maps

Since the internal structure of maps is not exposed by the FMap interface, rewriting can sometimes be a little confusing. Equality between maps is expressed by the equivalence Equal, which states that anything found in the first map is found at the same key in the second map. In other words:

forall k v, M.MapsTo k v m1 <-> M.MapsTo k v m2

This isn’t a problem if the terms you’re rewriting involve functions from the FMap modules, but if you create a new function that operates on maps, you’ll need to accompany it with a proof relating it to Equal. For example:

Definition map_operation `(m : M.t A) : M.t A := ...

Lemma map_operation_Proper :
  Proper (Equal ==> Equal) map_operation.

Now you can rewrite the arguments in a map_operation, provided you know they are Equal.

Also, if you find yourself facing difficulties using rewrite with folds, note that in addition to establishing a proof that the fold function is Proper for its arguments and result, you must also show that the final result is independent of the order of evaluation, since it’s not known from the FMap interface whether the contents of a map are reordered during insertion or not.

Abstracting the map implementation

Often when using maps, it’s not necessary to pick an implementation, you just need the map interface over a known key type. To do this, you just need to place your code in a module that itself requires and passes along the implementation module:

Require Import
  Coq.FSets.FMapFacts
  Coq.Structures.OrderedTypeEx.

Module MyModule (M : WSfun Nat_as_OT).

Module P := WProperties_fun Nat_as_OT M.
Module F := P.F.
...
End MyModule.

To later instantiate such a module functor using a map implementation, you’d write:

Require Import
  Coq.FSets.FMapFacts
  MyModule.

Module Import M := FMapList.Make(Nat_as_OT).
Module Import MyMod := MyModule M.

Abstracting over both map and key

When implementing generic algorithms that are applicable to any map, you’ll also need to abstract over the key type. In this case, you have two choices: Do you need to know that the key type is ordered, or do you only need to know that it’s decidable? Often the latter suffices, making the algorithm even more general.

In both cases, you may refer to the key type as either E.key or M.key (since the M module re-exports key), and you can check for key equality using E.eq:

Require Import
  Coq.FSets.FMapFacts
  Coq.Structures.DecidableTypeEx.

Module MoreFacts (E : DecidableType) (M : WSfun E).

Global Program Instance filter_Proper {elt} : forall P,
  Proper (E.eq ==> eq ==> eq) P
    -> Proper (M.Equal (elt:=elt) ==> M.Equal) (@P.filter elt P).
...

End MoreFacts.

To require an ordered type, which makes E.lt available, use:

Require Import
  Coq.FSets.FMapFacts
  Coq.Structures.OrderedTypeEx.

Module MoreFacts (E : OrderedType) (M : WSfun E).
...
End MoreFacts.

Putting it all together

Since you probably came here just wondering how to construct a map, add stuff to it, and then search for what you added, here is a complete example you can cut and paste to start off with:

Require Import
  Coq.FSets.FMapAVL
  Coq.FSets.FMapFacts
  Coq.Structures.OrderedTypeEx
  PeanoNat.

Module Import M := FMapAVL.Make(Nat_as_OT).

Module P := WProperties_fun Nat_as_OT M.
Module F := P.F.

Compute M.find 1 (M.add 1 10 (M.empty _)).
Compute P.for_all (fun k _ => k <? 10) (M.add 1 10 (M.empty _)).

Also note that there is N_as_OT, which is much faster to compute with if you are using large constants, but it requires familiarity with the NArith library.

Syndicated 2016-10-27 00:00:00 from Lost in Technopolis

Emacs: Pattern Matching with pcase

Emacs: Pattern Matching with pcase

This is a tutorial on how to use the pcase macro in modern flavors of GNU Emacs.

Exact matches

All data fits into some kind of pattern. The most explicit pattern is a description of the data itself. Let’s consider the following value as a running example:

'(1 2 (4 . 5) "Hello")

Explicitly stated, this is a list of four elements, where the first two elements are the integers 1 and 2, the third is a cons consisting of a car of 4 and a cdr of 5, and the fourth is the string "Hello". This states an explicit pattern we can match against using an equality test:

(equal value '(1 2 (4 . 5) "Hello"))

Pattern matches

Where patterns become useful is when we want to generalize a bit. Let’s say we want to do a similar equality test, but we don’t care what the final string’s contents are, only that it’s a string. Even though it’s simply state, this becomes quite difficult using an equality test:

(and (equal (subseq value 0 3) '(1 2 (4 .5)))
     (stringp (nth 3 value)))

What we would prefer is a more direct language for encoding our description of the family of values we’d like to match against. The way we said in English was: the first three elements exactly so, and the last element, any string. This is how we’d phrase that using `pcase’:

(pcase value
  (`(1 2 (4 . 5) ,(pred stringp))
    (message "It matched!")))

Think of pcase as a form of cond, where instead of evaluating each test for non-nil, it compares a series of patterns against the value under consideration (often called the “scrutinee” in the literature). There can be many patterns, and the first one wins, as with cond.

Capturing matches

But pcase can go one step further: Not only can we compare a candidate value against a family of possible values described by their pattern, we can also “capture” sub-values from that pattern for later use. Continuing from the last example, let’s say we want to print the string that match, even though we didn’t care about the contents of the string for the sake of the match:

(pcase value
  (`(1 2 (4 . 5) ,(and (pred stringp) foo))
    (message "It matched, and the string was %s" foo)))

Whenever a naked symbol like foo occurs as a logical pattern (see next section), the part of the value being matched at that position is bound to a local variable of the same name.

Logical and literal patterns

To master pcase, there are two types of patterns you must know: Logical patterns, and literal, or quoted, patterns. Logical patterns describe the kind of data we’d like to match against, and other special actions to take when it matches; and quoted patterns are the “literal” aspect, stating the exact form of a particular match.

Literal patterns are by far the easiest to think about. To match against any atom, string, or list of the same, the corresponding literal pattern is that exact value. So the literal pattern "foo" matches the string "foo", 1 matches the atom 1, etc.

pcase matches against a list of logical patterns, so to use a literal pattern, we must quote it, unless it consists entirely of self-quoting atoms:

(pcase value
  ('sym (message "Matched the symbol `sym'"))
  ((1 2) (message "Matched the list (1 2)")))

Literal patterns may also be introduced using a backquote, in which case commas may be used to place logical patterns within them, in exactly the same way that quoting and anti-quoting works for macros. For example:

(pcase value
  (`(1 2 ,(or 3 4))
   (message "Matched either the list (1 2 3) or (1 2 4)")))

More on logical patterns

There are many special logical patterns. Let’s consider them one by one.

Underscore _

To match against anything whatsoever, no matter its type or value, use underscore. Thus to match against a list containing anything at all at its head, we’d use:

(pcase value
  (`(,_ 1 2)
   (message "Matched a list of anything followed by (1 2)")))

Symbol

When performing a match, if a symbol occurs within a logical pattern, it binds whatever was found at that position to a local symbol of the same name. Some examples will help to make this clearer:

(pcase value
  (`(1 2 ,foo 3)
   (message "Matched 1, 2, something now bound to foo, and 3"))
  (foo
   (message "Match anything at all, and bind it to foo!"))
  (`(,the-car . ,the-cdr))
   (message "Match any cons cell, binding the car and cdr locally"))

The reason for doing this is two-fold: Either to refer to a previous match later in the pattern (where it is compared using eq), or to make use of a matched value within the related code block:

(pcase value
  (`(1 2 ,foo ,foo 3)
   (message "Matched (1 2 %s %s 3)" foo)))

(or PAT ...) and (and PAT ...)

We can express boolean logic within a pattern match using the or and and Patterns:

(pcase value
  (`(1 2 ,(or 3 4)
     ,(and (pred stringp)
           (pred (string> "aaa"))
           (pred (lambda (x) (> (length x) 10)))))
   (message "Matched 1, 2, 3 or 4, and a long string "
            "that is lexically greater than 'aaa'")))

pred predicates

Arbitrary predicates can be applied to matched elements, where the predicate will be passed the object that matched. As in the previous example, lambdas can be used to form arbitrarily complex predicates, with their own logic. See above for examples.

guard expressions

At any point within a match, you may assert that something is true by inserting a guard. This might consult some other variable to confirm the validity of a pattern at a given time, or it might reference a local symbol that was earlier bound by the match itself, as described above:

(pcase value
  (`(1 2 ,foo ,(guard (and (not (numberp foo)) (/= foo 10)))
   (message "Matched 1, 2, anything, and then anything again, "
            "but only if the first anything wasn't the number 10"))))

Note that in this example, the guard occurs at a match position, so even though the guard doesn’t refer to what is being matched, if it passes, then whatever occurs at that position (the fourth element of the list), would be an unnamed successful matched. This is rather bad form, so we can be more explicit about the logic here:

(pcase value
  (`(1 2 ,(and foo (guard (and (not (numberp foo)) (/= foo 10)))) _)
   (message "Matched 1, 2, anything, and then anything again, "
            "but only if the first anything wasn't the number 10"))))

This means the same, but associates the guard with the value it tests, and makes it clear that we don’t care what the fourth element is, only that it exists.

Pattern let bindings

Within a pattern we can match sub-patterns, using a special form of let that has a meaning specific to `pcase’:

(pcase value
  (`(1 2 ,(and foo (let 3 foo)))
   (message "A weird way of matching (1 2 3)")))

This example is a bit contrived, but it allows us to build up complex guard patterns that might match against values captured elsewhere in the surrounding code:

(pcase value1
  (`(1 2 ,foo)
   (pcase value2
     (`(1 2 ,(and (let (or 3 4) foo) bar))
      (message "A nested pcase depends on the results of the first")))))

Here the third value of value2 – which must be a list of exactly three elements, starting with 1 and 2 – is being bound to the local variable bar, but only if foo was a 3 or 4. There are many other ways this logic could be expressed, but this gives you a test of how flexibly you can introduce arbitrary pattern matching of other values within any logical pattern.

pcase-let and pcase-let*

That’s all there is to know about pcase! The other two utilities you might like to use are pcase-let and pcase-let*, which do similar things to their logical pattern counter-part let, but as regular Lisp forms:

(pcase-let ((`(1 2 ,foo) value1)
            (`(3 4 ,bar) value2))
  (message "value1 is a list of (1 2 %s); value2 ends with %s"
           foo bar))

Note that pcase-let does not fail, and always executes the correspond forms unless there is a type error. That is, value1 above is not required to fit the form of the match exactly. Rather, every binding that can paired is bound to its corresponding element, but every binding that cannot is bound to nil:

(pcase-let ((`(1 2 ,foo) '(10)))
  (message "foo = %s" foo))   => prints "foo = nil"

(pcase-let ((`(1 2 ,foo) 10))
  (message "foo = %s" foo))   => Lisp error, 10 is not a list

(pcase-let ((`(1 2 ,foo) '(3 4 10)))
  (message "foo = %s" foo))   => prints "foo = 10"

Thus, pcase-let can be thought of as a more expressive form of destructuring-bind.

The pcase-let* variant, like let*, allows you to reference bound local symbols from prior matches.

(pcase-let* ((`(1 2 ,foo) '(1 2 3))
             (`(3 4 ,bar) (list 3 4 foo)))
  (message "foo = %s, bar = %s" foo bar))  => foo = 3, bar = 3

However, if you name a symbol with same name in a later logical pattern, it is not used as an eq test, but rather shadows that symbol:

(pcase-let* ((`(1 2 ,foo) '(1 2 3))
             (`(3 4 ,foo) '(3 4 5)))
  (message "1 2 %s" foo))

This prints out "1 2 5", rather than the current match.

Syndicated 2016-01-21 00:00:00 from Lost in Technopolis

Simpler conduit library based on monadic folds

Simpler conduit library based on monadic folds

Recently I was playing around with the core types in the conduit library (attempting to change leftovers so you could only unget values you had read), when I stumbled across a formulation of those types that lead to some interesting simplifications.

Before I jump in, let’s review what any effectful streaming library should aim to accomplish. The basics are:

  1. Iterate over values within a structure, or produced by a computation.
  2. Cleanup resources involved in that computation once they are no longer needed.
  3. Allow processing to be composed nicely, forming a “pipeline” from the initial source to a final sink.
  4. It would be nice if any part of the pipeline could decide when to terminate.

What I discovered during my exploration is that all four of these requirements can be captured using simple, monadic folds, like foldM. Here is the type of foldM:

foldM :: Monad m => (a -> b -> m a) -> a -> [b] -> m a

We can obtain a slightly easier function type for our needs by reversing the arguments:

sourceList :: Monad m => [b] -> a -> (a -> b -> m a) -> m a

This says that given a list of elements of type b, sourceList returns a function that knows how to generate a result type a from a starting value by folding over every element of that list. We might trivially sum lists of integers as follows:

sourceList [1..10] 0 $ \acc x -> return $ acc + x

We can abstract our summing function into a sink that works on any source of integers:

sumC :: (Num a, Monad m)
     => (a -> (a -> a -> m a) -> m a) -> m a
sumC await = await 0 $ \acc x -> return $ acc + x

sumC is a higher-order function that takes a fold closure obtained from sourceList [1..10] above. (I call the closure await, although it’s behavior is a lot closer to a folding-variant of the awaitForever function from conduit). await wants a starting state, and a function to fold that state over the incoming elements.

Both of these are regular, higher-order functions, so we can build a pipeline using nothing more than function application:

sumC (sourceList [1..10])

Notice how close this is to the non-streaming version sum (id [1..10]); and if we execute the pipeline using runIdentity, the two are identical.

Adding type synonyms

Since the “fold closure” argument is cumbersome to restate, let’s restate it as a type synonym:

type Source m a r = r -> (r -> a -> m r) -> m r

With this synonym, the example source and sink become:

sourceList :: Monad m => [a] -> Source m a r
sumC :: (Num a, Monad m) => Source m a a -> m a

Another pattern we’ll start noticing pretty shortly is that every “sink” is a fold from a Source down to its result type. We can capture this using another type synonym:

type Sink a m r = Source m a r -> m r

It’s not really necessary, but it advertises to the reader that we’re defining a sink. Likewise, a “conduit” is always a mapping from one source to another where the result type is common:

type Conduit a m b r = Source m a r -> Source m b r

In cases where the result types must differ (for example, the dropC function in simple-conduit), we cannot use these type synonyms, but they are handy in the majority of cases.

With these synonyms, the types of our sources and sinks should start looking familiar to users of the regular conduit library (mapC here is based on conduit-combinators):

sourceList :: Monad m => [a] -> Source m a r
mapC :: Monad m => (a -> b) -> Conduit a m b r
sumC :: (Num a, Monad m) => Sink a m a

Conduit has special operators for connecting sources with sinks, and for mapping sources to sources. We don’t need them, since we’re just applying functions to functions, but we can define them as synonyms easily enough:

infixl 1 $=
($=) :: a -> (a -> b) -> b
($=) = flip ($)

infixr 2 =$
(=$) :: (a -> b) -> (b -> c) -> a -> c
(=$) = flip (.)

infixr 0 $$
($$) :: a -> (a -> b) -> b
($$) = flip ($)

We can now express the pipeline in three different ways:

sumC (mapC (+1) (sourceList [1..10]))

sumC $ mapC (+1) $ sourceList [1..10]

sumC $= mapC (+1) $$ sourceList [1..10]

This will perhaps seem more compelling if we use a file:

mapM_C putStrLn (sourceFile "hello.hs")

This action prints the contents of the given file, doing so in constant space and without employing lazy I/O. It handles opening and closing of the file for us, and deals properly cleanup in the case of exceptions.

Early termination

There is just one detail we haven’t implemented yet, and that is the ability for segments in the pipeline to abort processing early. To encode this, we need some short-circuiting behavior, which sounds like a job for Either:

type Source m a r =
    r -> (r -> a -> m (Either r r)) -> m (Either r r)

Once we start implementing sources and sinks, it will be much more convenient to use EitherT instead of returning an Either value:

type Source m a r =
    r -> (r -> a -> EitherT r m r) -> EitherT r m r

This way the monadic action of EitherT provides the short-circuiting behavior, rather than having to encode that explicitly in various places.

And that’s it! As simple as it is, this set of types is expressive enough to implement many of the combinators from the original conduit library. Of course, it’s not nearly as capable, but it’s leaner, easier to understand the core types, and significantly faster in some situations (computation of simple pipelines over Identity on my machine were about 45% faster).

Consumers and producers

One thing that conduit makes very easy to do is to abstract Sinks and Conduits as Consumers, and Sources and Conduits as Producers. Based on our presentation above such an abstraction is not possible. However, we can regain some of the generality with a helper function: You can turn sinks into conduits using a new combinator, returnC:

sinkList $ returnC $ sumC $ mapC (+1) $ sourceList [1..10]

Syndicated 2014-06-06 00:00:00 from Lost in Technopolis

Notes on Free monads

Notes on Free monads

The following article is just a few notes on the nature of the Free monad.

> {-# LANGUAGE DeriveFunctor #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> {-# LANGUAGE UndecidableInstances #-}
>
> module FreeMaybe where
>
> import Control.Monad (join)
> import Control.Monad.Writer.Class

There can be just two values of type Maybe a: Nothing and Just a. Now let’s look at the free monad of Maybe a, Free Maybe a:

> data Free f a = Pure a | Free (f (Free f a))
>
> instance Functor f => Functor (Free f) where
>     fmap f (Pure a)   = Pure (f a)
>     fmap f (Free ffa) = Free $ fmap (fmap f) ffa
>
> instance Functor f => Monad (Free f) where
>     return = Pure
>     Pure a >>= f = f a
>     Free ffa >>= f = Free $ fmap (>>= f) ffa
>
> instance (Show a, Show (f (Free f a))) => Show (Free f a) where
>     showsPrec d (Pure a) = showParen (d > 10) $
>         showString "Pure " . showsPrec 11 a
>     showsPrec d (Free m) = showParen (d > 10) $
>         showString "Free " . showsPrec 11 m

There are four “shapes” that values of Free Maybe a can take:

Pure a
Free Nothing
Free (Just (Free (Just (... (Free Nothing)))))
Free (Just (Free (Just (... (Free (Pure a))))))

In terms of whether a Free Maybe a represents an a or not, Free Maybe a is equivalent to Maybe a. However, Maybe a is right adjoint to Free Maybe a, meaning that it forgets the structure of Free Maybe a – namely, which of the four shapes above the value was, and how many occurences of Free (Just there were.

Why would you ever use Free Maybe a? Precisely if you cared about the number of Justs. Now, say we had a functor that carried other information:

> data Info a = Info { infoExtra :: String, infoData :: a }
>     deriving (Show, Functor)

Then Free Info a is isomorphic to if infoExtra had been [String]:

> main :: IO ()
> main = do
>     print $ Free (Info "Hello" (Free (Info "World" (Pure "!"))))

Which results in:

>>> main
Free (Info {infoExtra = "Hello",
            infoData = Free (Info {infoExtra = "World", infoData = Pure "!"})})
it :: ()

But now it’s also a Monad, even though we never defined a Monad instance for Info:

> main :: IO ()
> main = do
>     print $ do
>         x <- Free (Info "Hello" (Pure "!"))
>         y <- Free (Info "World" (Pure "!"))
>         return $ x ++ y

This outputs:

>>> foo
Free (Info {infoExtra = "Hello",
            infoData = Free (Info {infoExtra = "World", infoData = Pure "!!"})})
it :: ()

This works because the Free monad simply accumulates the states of the various functor values, without “combining” them as a real monadic join would have done. Free Info a has left it up to us to do that joining later.

Syndicated 2013-09-23 00:00:00 from Lost in Technopolis

Using monad-control with monad transformers

Using monad-control with monad transformers

This article assumes familiarity with monads and monad transformers. If you’ve never had an occasion to use lift yet, you may want to come back to it later.

The Problem

What is the problem that monad-control aims to solve? To answer that, let’s back up a bit. We know that a monad represents some kind of “computational context”. The question is, can we separate this context from the monad, and reconstitute it later? If we know the monadic types involved, then for some monads we can. Consider the State monad: it’s essentially a function from an existing state, to a pair of some new state and a value. It’s fairly easy then to extract its state and later use it to “resume” that monad:

import Control.Applicative
import Control.Monad.Trans.State

main = do
    let f = do { modify (+1); show <$> get } :: StateT Int IO String
    
    (x,y) <- runStateT f 0
    print $ "x = " ++ show x   -- x = "1"
    
    (x',y') <- runStateT f y
    print $ "x = " ++ show x'  -- x = "2"

In this way, we interleave between StateT Int IO and IO, by completing the StateT invocation, obtaining its state as a value, and starting a new StateT block from the prior state. We’ve effectively resumed the earlier StateT block.

Nesting calls to the base monad

But what if we didn’t, or couldn’t, exit the StateT block to run our IO computation? In that case we’d need to use liftIO to enter IO and make a nested call to runStateT inside that IO block. Further, we’d want to restore any changes made to the inner StateT within the outer StateT, after returning from the IO action:

import Control.Applicative
import Control.Monad.Trans.State
import Control.Monad.IO.Class

main = do
    let f = do { modify (+1); show <$> get } :: StateT Int IO String

    flip runStateT 0 $ do
        x <- f
        y <- get
        y' <- liftIO $ do
            print $ "x = " ++ show x   -- x = "1"

            (x',y') <- runStateT f y
            print $ "x = " ++ show x'  -- x = "2"
            return y'
        put y'

A generic solution

This works fine for StateT, but how can we write it so that it works for any monad tranformer over IO? We’d need a function that might look like this:

foo :: MonadIO m => m String -> m String
foo f = do
    x <- f
    y <- getTheState
    y' <- liftIO $ do
        print $ "x = " ++ show x

        (x',y') <- runTheMonad f y
        print $ "x = " ++ show x'
        return y'
    putTheState y'

But this is impossible, since we only know that m is a Monad. Even with a MonadState constraint, we would not know about a function like runTheMonad. This indicates we need a type class with at least three capabilities: getting the current monad tranformer’s state, executing a new transformer within the base monad, and restoring the enclosing transformer’s state upon returning from the base monad. This is exactly what MonadBaseControl provides, from monad-control:

class MonadBase b m => MonadBaseControl b m | m -> b where
    data StM m :: * -> *
    liftBaseWith :: (RunInBase m b -> b a) -> m a
    restoreM :: StM m a -> m a

Taking this definition apart piece by piece:

  1. The MonadBase constraint exists so that MonadBaseControl can be used over multiple base monads: IO, ST, STM, etc.

  2. liftBaseWith combines three things from our last example into one: it gets the current state from the monad transformer, wraps it an StM type, lifts the given action into the base monad, and provides that action with a function which can be used to resume the enclosing monad within the base monad. When such a function exits, it returns a new StM value.

  3. restoreM takes the encapsulated tranformer state as an StM value, and applies it to the parent monad transformer so that any changes which may have occurred within the “inner” transformer are propagated out. (This also has the effect that later, repeated calls to restoreM can “reset” the transformer state back to what it was previously.)

Using monad-control and liftBaseWith

With that said, here’s the same example from above, but now generic for any transformer supporting MonadBaseControl IO:

{-# LANGUAGE FlexibleContexts #-}

import Control.Applicative
import Control.Monad.Trans.State
import Control.Monad.Trans.Control

foo :: MonadBaseControl IO m => m String -> m String
foo f = do
    x <- f
    y' <- liftBaseWith $ \runInIO -> do
        print $ "x = " ++ show x   -- x = "1"

        x' <- runInIO f
        -- print $ "x = " ++ show x'

        return x'
    restoreM y'

main = do
    let f = do { modify (+1); show <$> get } :: StateT Int IO String

    (x',y') <- flip runStateT 0 $ foo f
    print $ "x = " ++ show x'   -- x = "2"

One notable difference in this example is that the second print statement in foo becomes impossible, since the “monadic value” returned from the inner call to f must be restored and executed within the outer monad. That is, runInIO f is executed in IO, but it’s result is an StM m String rather than IO String, since the computation carries monadic context from the inner transformer. Converting this to a plain IO computation would require calling a function like runStateT, which we cannot do without knowing which transformer is being used.

As a convenience, since calling restoreM after exiting liftBaseWith is so common, you can use control instead of restoreM =<< liftBaseWith:

y' <- restoreM =<< liftBaseWith (\runInIO -> runInIO f)

-- becomes...
y' <- control $ \runInIO -> runInIO f

Another common pattern is when you don’t need to restore the inner transformer’s state to the outer transformer, you just want to pass it down as an argument to some function in the base monad:

foo :: MonadBaseControl IO m => m String -> m String
foo f = do
    x <- f
    liftBaseDiscard forkIO $ f

In this example, the first call to f affects the state of m, while the inner call to f, though inheriting the state of m in the new thread, but does not restore its effects to the parent monad transformer when it returns.

Now that we have this machinery, we can use it to make any function in IO directly usable from any supporting transformer. Take catch for example:

catch :: Exception e => IO a -> (e -> IO a) -> IO a

What we’d like is a function that works for any MonadBaseControl IO m, rather than just IO. With the control function this is easy:

catch :: (MonadBaseControl IO m, Exception e) => m a -> (e -> m a) -> m a
catch f h = control $ \runInIO -> catch (runInIO f) (runInIO . h)

You can find many function which are generalized like this in the packages lifted-base and lifted-async.

Syndicated 2013-09-21 00:00:00 from Lost in Technopolis

A whirlwind tour of conduits

A whirlwind tour of conduits

While talking with people on IRC, I’ve encountered enough confusion around conduits to realize that people may not know just how simple they are. For example, if you know how to use generators in a language like Python, then you know pretty much everything you need to know about conduits.

The basics

Let’s take a look at them step-by-step, and I hope you’ll see just how easy they are to use. We’re also going to look at them without type signatures first, so that you get an idea of the usage patterns, and then we’ll investigate the types and see what they mean.

Everything in conduit begins with the Source, which yields data as it is demanded. The dumbest possible form of source is an empty source:

empty = return ()

The next dumbest is a source that yields only a single value:

single = yield 1

In order to use any Source, I must ultimately connected it with a Sink. Sinks are nothing more than code which awaits values from a Source. Let’s look at an example in Python, where these concepts are features of the language itself:

def my_generator():
    for i in range(1, 10):
        yield i

for j in my_generator():
    print j

Here we have a generator (aka Source): a function which simply yields values. This generator is being passed to for statement that consumes the values from it and binds them one by one to a variable j. It then prints each value after it is consumed.

The equivalent code using conduit employs a different syntax, but the general “shape” of the code is the same:

import Control.Monad
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)
import Data.Conduit

myGenerator = forM_ [1..9] yield

main = myGenerator $$
           whileJust_ await $ \j -> 
               liftIO $ print j

I can make the code a little bit closer to Python’s example (making the call to await implicit) if I use Data.Conduit.List:

import Control.Monad
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)
import Data.Conduit
import qualified Data.Conduit.List as CL

myGenerator = forM_ [1..9] yield

main = myGenerator $$ 
           CL.mapM_ $ \j -> 
               liftIO $ print j

Just regular code

Neither Sources nor Sinks have to be special functions, however. They are just regular code written in the ConduitM monad transformer:

import Data.Conduit
import Control.Monad.IO.Class (liftIO)

main = do
    (do yield 10
        yield 20
        yield 30)
        $$
        (do liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await)

Each time await is called, it returns a value that was yielded by the source wrapped in Just, or it returns Nothing to indicate the source has no more values to offer.

There, now you know the basics of the conduit library.

Conduits

Between sources and sinks, there is a third kind of conduit, which is actually called just Conduit. A Conduit sits between sources and sinks, and is able to call both yield and await, applying some kind of transformation or filter to the data coming from the source, before it reaches the sink. In order to use a Conduit, you must fuse it to either a source or a sink, creating a new source/sink which has the action of the Conduit bound to it. For example:

import Data.Conduit
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)

main = do
    (do yield 10
        yield 20
        yield 30)
        $=
        (do whileJust_ await $ \x ->
                yield (x * 2))
        $$
        (do liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await)

This example fuses a conduit that doubles the incoming values from the source to its left. We could equivalently have fused it with the sink to the right. In most cases it doesn’t matter whether you fuse to sources or to sinks; it mainly comes into play when you are using such fusion to create building blocks that will be used later.

Use the types, Luke

Now that we have the functionality of conducts down, let’s take a look at their types so that any errors you may encounter are less confusing.

A source has the type Source m Foo, where m is the base monad and Foo is the type of what you want to pass to yield.

A sink has the corresponding type Sink m Foo a, to indicate that await returns values of type Maybe Foo, while the monadic operation of the sink returns a value of type a.

A conduit between these two would have type Conduit Foo m Foo.

You’re probably going to see the type ConduitM in your types errors too, since the above three are all synonyms for it. It’s a more general type that these three specialized types. The correspondences are:

type Source m o    = ConduitM () o m ()
type Sink i m r    = ConduitM i Void m r
type Conduit i m o = ConduitM i o m ()

The Void you see in there is just enforcing the fact that sinks cannot call yield.

What’s next?

Beyond this, most of the conduit library is a bunch of combinators to make them more convenient to use. In a lot of cases, you can reduce conduit code down to something which is just as brief and succinct as what you might write in languages with native support for such operations. It’s a testiment to Haskell, rather, that it doesn’t need to be a syntactic feature to be both useful and concise.

And what about pipes, and the other competing libraries in this space? In many ways they are each equivalent to what I’ve described above. If you want to use pipes, just write respond and request instead of yield and await, and you’re pretty much good to go! The operators for binding and fusing are different too, but what they accomplish is likewise the same.

If you’re interested in learning more about conduit and how to use it, check out the author’s own tutorial.

Syndicated 2013-07-16 00:00:00 from Lost in Technopolis

Update of gitlib libraries on Hackage, plus git-monitor

Update of gitlib libraries on Hackage, plus git-monitor

I’ve decided after many months of active development to release version 1.0.1 of gitlib and its related libraries to Hackage. There is still more code review to done, and much documentation to be written, but this gets the code out there, which has been working very nicely at FP Complete for about six months now.

The more exciting tool for users may be the git-monitor utility, which passively and efficiently makes one-minute snapshots of a single Git working tree while you work. I use it continually for the repositories I work on during the day. Just run git-monitor -v in a terminal window, and start making changes. After about a minute you should see commit notifications appearing in the terminal window.

Syndicated 2013-06-30 00:00:00 from Lost in Technopolis

71 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!