Older blog entries for johnw (starting at number 68)

15 Jul 2010 (updated 15 Jul 2010 at 17:07 UTC) »

A word on Haskell Monads and C++

After spending a good while trying to understand monads in Haskell, and why the Haskell world is so fascinated by them, I finally understand why they aren’t as exciting to other languages, or why they are completely missing from languages like C++: because they’re mostly already there.

At its simplest, a monad is an abstraction of a value which knows how to apply functions to that value, returning a new monad. In other words, it’s a way to turn values into little packages that wrap additional functionality around that value. Sounds a lot like what an object does…

But this doesn’t tell you what’s exciting about them, from Haskell’s point of view. Another way of looking at them, without going into the wheres and whys, is this: In a lazily-evaluated, expression-based language, monads let you express sequenced, interdependent computation.

Consider the following two code examples. First, in C++:

  #include <iostream>
int main() {
  std::cout << "Hello, world!"
            << "  This is a sample"
            << " of using a monad in C++!"
            << std::endl;
  return 0;

And the same code in Haskell:

  module Main where
main :: IO ()
main = do putStr "Hello, world!"
          putStr "  This is a sample"
          putStr " of using a monad in C++!"
          putStr "\n"

What the IO monad in the second example is doing is making the sequenced evaluation of the print statements possible using a nice, normal looking syntax. The C++ code doesn’t need monads to do this, because it already embodies the concept of abstracted values (here, the iostream passed between insertion operators) and sequenced computation (because it’s not lazy).

To compare Monads with C++:

  1. Monads are abstractions of values. So are most C++ objects.

  2. Monads permit functions to be applied to the “contained” value, returning a a new version of the monad. C++ objects provide methods, where the mutated object is the new version.

  3. Monads provide a way to encapsulate values in new monads. C++ objects have constructors.

As another example, consider the case where you have to call five functions on an integer, each using the return value of the last:


This is an identical operation in both Haskell and C++. But what if the return value of each function wasn’t an integer, but an “object” that could either be an integer, or an uninitialized value? In most languages, there’s either a type, or syntax, for this concept:

  C++      boost::optional<int>
C#       int?
Java     Integer
Haskell  Maybe Int

If each function returns one of these, but takes a real integer, it means we have to check the “null” status of each return value before calling the next function. In C++ this leads to a fairly common idiom:

  if (boost::optional<int> x1 = f(10))
  if (boost::optional<int> x2 = g(*x1))
    if (boost::optional<int> x3 = h(*x2))
      if (boost::optional<int> x4 = i(*x3))

Note that not only are these calls sequential, but due to the meaning of optionality, they are also inherently short-circuiting. If f returns none, none of the other functions get called.

Haskell can do this type of thing natively as well, and it looks similar:

  case f 10 of
  Nothing -> Nothing
  Just x1 -> 
    case g x1 of
      Nothing -> Nothing
      Just x2 -> 
        case h x2 of
          Nothing -> Nothing
          Just x3 -> 
            case i x3 of
              Nothing -> Nothing
              Just x4 -> j x4

But it’s ugly as sin. In C++, we can be evil and flatten things out using basic features of the language, assuming we pre-declare the variables:

  (   (x1 = f(10))
 && (x2 = g(*x1))
 && (x3 = h(*x2))
 && (x4 = i(*x3))
 && (x5 = j(*x4)), x5)

Or you can eliminate the use of temporaries altogether by creating a wrapper class:

  template <typename T> struct Maybe {
  boost::optional<T> value;

  Maybe() {}
  Maybe(const T& t) : value(t) {}
  Maybe(const Maybe& m) : value(m.value) {}

  Maybe operator>>(boost::function<Maybe<T>(const T&)> f) const {
    return value ? f(*value) : *this;

If we change our functions to return Maybe<int> instead of just boost::optional<T>, it allows us to write this:

  f(10) >> g >> h >> i >> j

Which in Haskell is written almost the same way:

  f 10 >>= g >>= h >>= i >>= j

But where Haskell needs Monads to make this type of thing reasonable and concise, C++ doesn’t. We get passing around of object state between function calls as part of the core language, and there are many different ways to express it. However, if you confined C++ to function definitions and return statements only – where all function arguments were pass-by-value – then things like Monads would become an essential technique for passing knowledge between calls.

So it’s not that you can’t use Monads in C++, it’s just that they require enough extra machinery, and aren’t unique enough compared to core features of the language, that there isn’t the same level of motivation for them as there is in Haskell, where they can really add to the expressiveness of code.

Syndicated 2010-07-15 12:20:06 (Updated 2010-07-15 16:50:38) from Lost in Technopolis

30 Oct 2009 (updated 10 Nov 2009 at 09:12 UTC) »

A C++ gotcha on Snow Leopard

I’ve seen this issue mentioned in some random and hard to reach places on the Net, so I thought I’d re-express it here for those who find Google sending them this way.

On Snow Leopard, Apple decided to build g++ and the standard C++ library with “fully dynamic strings” enabled. What this means for you relates to the empty string.

When fully dynamic strings are off (as was true in Leopard), there exists a single global variable representing the empty string. This variable lives in the data segment of libstdc++, and so it does not exist on the heap. Whenever a string is deconstructed, the standard library would check whether that string’s address matches matches the empty string’s: if so, it does nothing; if not, it calls free.

With fully dynamic strings on, there is no global empty string. All strings are on the heap, and once their reference count goes to zero, they get deallocated. Where this creates a problem is if you mix and match code. If a library that does have fully dynamic strings enabled (aka the standard library) receives an empty string from code which does not have it enabled (aka, the app you just built), it will try to free it and your application will crash.

Here’s a reproducible case for this issue using Boost:

  #include <string>
#include <sstream>
#include <boost/variant.hpp>

int main()
  std::ostringstream buf;
  boost::variant<bool, std::string> data;
  data = buf.str();
  data = false;
  return 0;

In this case – which really happened to me – I created an empty string by calling ostringstream::str(). Since I don’t have fully dynamic string on, its address is in data space, not on the heap. I pass this string to boost::variant, which makes a copy of that address. Later, when the variant is reassigned false, it calls ~basic_string to deconstruct the string. Since my standard library is compiled with fully dynamic strings, the destructor for basic_string doesn’t recognize that its the “special” empty string, so it tries to free it.

The solution to this problem is three-fold:

  1. You must be using the g++ that comes with Xcode, or if you build your own (say, via MacPorts), you must configure it using --enable-fully-dynamic-string. I’ve already submitted a patch to this effect to the MacPorts crew.

  2. All libraries must be compiled with -D_GLIBCXX_FULLY_DYNAMIC_STRING.

  3. Your own code must be compiled with -D_GLIBCXX_FULLY_DYNAMIC_STRING.

You’ll know if this issue is biting you by looking at a stack trace in gdb. You’ll see a crash somewhere inside basic_string’s _M_destroy (which calls free). Move up the trace a bit and check whether the string it’s trying to free is 0 bytes long.

To recap: what’s happened is that an empty string constructed by code without fully dynamic strings got deallocated by code that was. That is, most likely you, or a library you built, handed an empty std::string to the system library.

Syndicated 2009-10-30 09:35:32 (Updated 2009-11-10 08:12:39) from Lost in Technopolis

Branch policies with Git

I’ve been managing my Ledger project with Git for some time now, and I’ve finally settled into a comfortable groove concerning branches and where to commit stuff.

Essentially I use four branches, in increasing order of commit frequency. Each branch has its own policy and purpose, which are described below.


Every release of Ledger is made from the maint branch, and every commit on that branch is potentially a release. This means that no commit is made until some serious vetting takes place. When the master branch is at a state where I want to finally release it, I merge with =–no-ff=, so the merge gets represented as a single commit on the maint branch. Then I tag the release and make a distribution tarball.

It’s possible after a release that patches need to get applied to maint, and a point release made. Once this is done, the applicable patches are either merged into master, or if the two diverse too greatly I will begin cherry-picking instead. Once cherry-picking starts, no more merges into master will occur until after the next release merge happens in maint.

The purpose of maint is to provide the most stable release possible to the public.


Master is where most people get the latest source code from, so it is kept reasonable stable. There is a commit hook which guarantees that all commits to this branch build and pass the test suite. Since most development work happens on “next”, each time next is stable I merge into master, using =–no-ff= to keep the merge commits together. I also use =–no-commit=, so the merge must pass the commit hook in order to go in.

Note that no commits are ever made directly to master, unless I’ve seriously broken something that needs to be addressed sooner than the next merge from “next”. In that case, I’ll cherry pick this commit into master afterward. Merges only happen into master from next, and only from master into maint.

The purpose of master is to provide reasonably stable development snapshots to the public.


The next branch is where I commit most often, and while I try to keep it functional, this is not always the case. I don’t run unit tests here for every commit, just before every push (mostly). Most of my friends follow this branch, because it updates very often.

The purpose of next is to provide potentially unstable, frequent development snapshots to the public.


The test branch comes in and out of existence, and should only ever be pulled using =pull –rebase=. It contains trial commits that I want someone to test out. It’s a delivery branch, and after it’s been used I either delete it or ignore it until the next time it’s necessary.

The purpose of test is to communicate patch candidates to a particular person at a particular time.


Then there are the various local-only topic branches that live on my machine, in which I develop highly unstable code relating to one feature or another, awaiting the day when it becomes stable enough to be merge into “next”.

Syndicated 2009-10-29 03:05:57 (Updated 2009-10-29 06:23:55) from Lost in Technopolis

Response to PG's "How to Do Philosophy"

Back in late 2007, Paul Graham put up an essay titled “How to Do Philosophy”, in which Mr. Graham hoped to elucidate where Philosophy went wrong and why the field, as now practiced, must be renovated to remain useful. In fact, he goes so far as to suggest that much of philosophy has no benefit whatsoever:

The proof of how useless some of their answers turned out to be is how little effect they have. No one after reading Aristotle’s Metaphysics does anything differently as a result.

If I may, as a student of philosophy, I would like to offer my response to this argument, whose tenants have been repeated many times throughout Philosophy’s history.

The spirit of philosophy

As far back as Plato’s Republic (and most likely long before then) there have been debates on the merit of philosophy. In Plato’s book it is between Socrates and Glaucon, who fears that men may waste their time in fruitless contemplation:

Socrates: I am amused, I said, at your [Glaucon’s] fear of the world, which makes you guard against the appearance of insisting upon useless studies; and I quite admit the difficulty of believing that in every man there is an eye of the soul which, when by other pursuits lost and dimmed, is by these purified and re-illumined; and is more precious far than ten thousand bodily eyes, for by it alone is truth seen….

Earlier Socrates had said something similar, and in briefer terms:

Socrates: Then must not a further admission be made?

Glaucon: What admission?

Socrates: That the knowledge at which geometry aims is knowledge of the eternal, and not of aught perishing and transient.

Glaucon: That, he replied, may be readily allowed, and is true.

Socrates: Then, my noble friend, geometry will draw the soul towards truth, and create the spirit of philosophy, and raise up that which is now unhappily allowed to fall down.

This “spirit of philosophy” is held by Socrates over and over again to be precious beyond compare: a light to illumine every aspect of life. If a lantern is something you can design, hold and weigh, yet this light is its intangible counterpart, granting the lamp its purpose. It is the “why” to the lantern’s “what” and “how”. It can neither be designed, nor held, nor weighed, but must be enkindled. And only then does the lamp come aglow…

The harp on practicality

I understand the need for practical results in a material world, but results are meaningless deprived of context. If we boil things down to their material essence, then what we do we do for survival: develop resources to protect and prolong life. But is surviving enough? Don’t people also seek meaning from what they do? Certainly I don’t enjoy programming merely to make a paycheck; I have to feel something more to keep me motivated year after year.

The harp on practicality levied against philosophy overstresses the “what” against the “why”. Mr. Graham debates how to make philosophy useful again, but I think he has lost the point of it: useful in terms of what? Does usefulness have a “why”? Who is to define the best “use” of anything, so that usefulness may be measured? Thus, there is a conundrum at this center of his argument: How can any man judge philosophy who has not discovered what it aims to impart?

Anyone can understand the concept of practicality. Even children connect the ideas of work and output. It’s why we hate cleaning our room, because it takes so much work yet we gain so little from it. But what is pratical is not the same as what is essential. Happiness, most of us know, is not found in more money, more power, or by more efficient processes. There is only one outcome in this life which is inevitable, and curiously neither industry nor indolence has any effect on its timing or nature. But whereas the practical man fears death as the end of opportunity, perhaps the philosopher sees it differently:

Socrates: The philosopher desires death – which the wicked world will insinuate that he also deserves: and perhaps he does, but not in any sense which they are capable of understanding. Enough of them: the real question is, What is the nature of that death which he desires? Death is the separation of soul and body – and the philosopher desires such a separation. He would like to be freed from the dominion of bodily pleasures and of the senses, which are always perturbing his mental vision. He wants to get rid of eyes and ears, and with the light of the mind only to behold the light of truth….

So the question is raised: Is there more than just this world? I don’t necessarily mean physical death, either. For there is a world of purely material pursuits and achievement – a world we share in common with animals – and there is a world of inspiration, abstraction, and fantasy, which only men participate in. The “practical man” knows well the value of practical things and he is an expert at perfecting the animal life; but it takes more than a well-fed stomach to bring true content. If not so, then cows should be our kings.

If a philosopher is anything, I say he is someone who forgoes all else to discover and adventure in that world, and to learn what effect immaterial consequences should have on our material life, if all is to be as it ought.

The bane of method

Not everyone who reads Plato, of course, comes away with mystical opinions. Just as there are those who eschew philosophy entirely and ignore its delights, so there are some who accept it but half-way. They see that philosophy prescribes a method and they fall in love with that method, dedicating the whole of their pursuit to refining it. Yes, Plato did stress the necessity of dialectic, but his stress had a purpose in mind. Not a material or pratical goal – hardly even a “useful” one in immediate terms – but a personal and soulful one.

Philosophy is ever so much more than method. In fact, the love of method has resulted in a few branches of philosophy which are hardly philosophy at all, but the art of analysis. What Plato used his method for was to approach noesis: to know the “real real”, to have a direct apprehension of reality freed from mortal conceptions; to “remember” the soul’s birth and origin; to return our perception of the world to an original, direct perception of Truth itself. Through this experience of true perception our breasts and minds would dilate, and every pursuit will become infused with the vibrating principle of Life.

Missing the point

This is why, when I read essays like Mr. Graham’s, I find myself thinking that his own success and momentum have caused him to miss the point. Philosophy is not meant to be practical. It is not meant to have a use. It does not exist to make us more productive girls and boys. It is a diet of words to feed our soul by way of stimulating our mind. It is not a roast-beef sandwich, but more the substance of an ethereal longing.

Some will ask, what is this thing that is words and nothing more? To them I reply: Then what is poetry? There are human endeavors which are little more than words or pigments on paper, that come to life only through the eye of an appreciate heart and mind. Does a man read Shakespeare and ask what profit he has gained? If he does then he cannot see the point. What he gains is immaterial – literally and figuratively – but may in the long run be immensely valuable. It depends on what he saw, how well he saw, and the breadth of his vision.

It is no different with Philosophy. Consider it an artform, or a method of tuning the soul through delicate adjustments of the mind. When one tunes a violin there is no melody played; that comes after. The fruit of philosphy is the philosopher’s life itself. It is how it changes the man that matters, not the changes he can prove to you from day to day.

So if you are accustomed to reading balance sheets and preparing quarterly projections, perhaps you are ill-equipped to judge philosophy. But if you measure the smile of a happy engineer against the despair of an endless, daily grind, maybe then you will have found the weight of philosophy’s fruit.

Syndicated 2009-05-13 20:04:27 (Updated 2009-05-13 20:04:43) from Lost in Technopolis

Journey into Haskell, part 6

Another thing to be learned down the Haskell rabbit-hole: Thinking in infinites. Today someone posed a puzzle which I tried to solve in a straight-forward, recursive manner: Building a list of primes. The requested algorithm was plain enough: > Create a list of primes "as you go", considering a number prime if it can't be divided by any number already considered prime. However, although my straightforward solution worked on discrete ranges, it couldn't yield a single prime when called on an infinite range -- something I'm completely unused to from other languages, except for some experience with the SERIES library in Common Lisp. ## An incomplete solution Looking similar to something I might have written in Lisp, I came up with this answer: primes = reverse . foldl fn [] where fn acc n | n `dividesBy` acc = acc | otherwise = (n:acc) dividesBy x (y:ys) | y == 1 = False | x `mod` y == 0 = True | otherwise = dividesBy x ys dividesBy x [] = False But when I suggested this on [#haskell](irc://irc.freenode.net/haskell), someone pointed out that you can't reverse an infinite list. That's when a light-bulb turned on: I hadn't learned to think in infinites yet. Although my function worked fine for discrete ranges, like `[1..100]`, it crashed on `[1..]`. So back to the drawing board, later to come up with this infinite-friendly version: primes :: [Int] -> [Int] primes = fn [] where fn _ [] = [] fn acc (y:ys) | y `dividesBy` acc = fn acc ys | otherwise = y : fn (y:acc) ys dividesBy _ [] = False dividesBy x (y:ys) | y == 1 = False | x `mod` y == 0 = True | otherwise = dividesBy x ys Here the accumulator grows for each prime found, but returns results in order whose value can be calculated as needed. This time when I put `primes [1..]` into GHCi it printed out prime numbers immediately, but visibly slowed as the accumulator grew larger.

Syndicated 2009-03-26 13:00:00 (Updated 2009-03-23 05:28:45) from Lost in Technopolis

Journey into Haskell, part 5

Haskell may be difficult to start out with, but once things start rolling, they roll fast. Yesterday (real world time, these blog entries are staggered) I had started the first lines of HackPorts, but now things are getting close to done for the first version. It's not that I've written much code, but that it was simple to integrate with other people's code. ## Borrowing all I can The first thing I wanted to do was avoid dealing with any of Hackage's data formats, so I cribbed everything I could from the `cabal-install` package. I actually imported the full source into HackPorts, ripped out its `List.hs` file, renamed it to my `Main.hs` file, and then began changing it from a function that prints out a list of available packages, to one that writes the data into properly formatted `Portfile` entries. The code does the following bits of work: 1. Talks to `cabal-install` and Cabal to get a list of all known packages on Hackage. 2. For every package, creates a directory named `haskell/$package`, and then writes information about that package into `haskell/$package/Portfile`. 3. As it does this, it fetches the current version's tarball over HTTP, and uses OpenSSL (directly, through FFI) to generate MD5, SHA1 and RIPEMD160 checksums of the tarball image. And voilá, a directory populated with 1136 Portfile entries. What's missing now is the external dependency mapping. As a stub, I have them all depending on `port:ghc`, but I think there's sufficient information in the Cabal package info to figure out what the right dependencies should be, both among the Hackage packages themselves and against any external libraries (like OpenSSL). ## What I learned As for my Haskell education, I learned about using Haskell's very nice FFI mechanism, and had a lot more experience using the IO Monad. An example of using FFI to call out to OpenSSL: {-# OPTIONS -#include "openssl/md5.h" #-} foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8) I now have access to a `c_md5` function, which go directly over to the C library to do its work. Not too shabby! As for the IO Monad, here is the `main` function for Hackports: main :: IO () main = do createDirectoryIfMissing True "haskell" pkgs

Syndicated 2009-03-24 13:00:00 (Updated 2009-03-23 05:28:22) from Lost in Technopolis

Functional yet lazy

As I explore Haskell, I’m discovering that one of its trickiest aspects is not structuring things functionally, but the lazy evaluation. It turns out lazy evaluation comes with both great benefits, and significant difficulties. I’d like to point a few of these out, as they’re becoming clearer to me.


One of the great benefits of lazy evaluation is that your code doesn’t need to account for the scale of an operation. Let’s take a simple example: checksumming a large file whose contents are being read, on-demand, over HTTP.

In C++, if I wanted to checksum a large file read over HTTP, I couldn’t buffer it memory because I don’t know how large it might get. Nor do I want my checksumming code to know anything about HTTP, or where the data comes from. The answer here is to use I/O streams. By passing a generic istream interface around, I can hide any knowledge of where the data came from. The checksumming algorithm just reads data from the stream as it needs to, and the HTTP layer downloads more bytes as required (typically caching to avoid constant network access).

However, there’s a downside to this: the checksumming code now knows something about I/O. In actuality, a checksumming algorithm only cares about the bytes being checksummed and little else. It shouldn’t have to know about I/O, or strings, or any of the details of where data comes from or how it’s structured. It should ideally receive a pointer to an arbitrary large sequences of 8-bit bytes, and return a fixed size checksum representing a fingerprint of those bytes.

Yet this naive approach can’t really be done in C++. If it were a file I was checksumming, I could memory map the file and pass around a byte pointer, and the OS would take care of lazily reading in the bytes for me as needed. But for a file being accessed over HTTP this would require first downloading the file and then checksumming it, when I specifically wanted to “checkum as I go”. Who knows, maybe I’ll discover a reason to stop summing beyond a certain point and I’d like to stop downloading at that point as well.

Well, just as memory mapping gives me lazy access to the contents of a file, a language with lazy evaluation gives me lazy access to the results of any algorithm – including downloading data over HTTP. With Haskell, I can indeed write my checksum algorithm as if it receives a giant byte buffer, and the language takes care of downloading only as much data as I’ve accessed (plus caching). This simplifies my checksumming code, and reduces the amount of knowledge that has to be passed around, such as a “stream” as opposed to a generic, 8-bit pointer.4

This simplification lets you design your algorithm as if in an ideal world. You want to process a bunch of numbers? Work on a list. What you say, the numbers are coming in from a socket and you don’t know when it will end? Doesn’t matter, just work on a list. In C++ I’d have to switch from passing a vector to passing an istream iterator, but in Haskell, I don’t care what algorithm is populating my list, only that it is a list, and that I know how to work it.


For all its beauty, laziness has three costs I’ve run into so far. The first is that it lets you very easily write functioning algorithms with horrible performance characteristics. This happens because laziness causes a promise5 to be constructed, which takes memory and time to do. Sometimes, the cost of the underlying operation is far less than the memory cost of carrying a promise around to do that operation at a later point. This isn’t true of a slow operation like reading from a socket, but it’s certainly true of something trivial like summing two integers. It means one has to be aware of promises, when they’re constructed, and when it’s more beneficial to force evaluation always versus the benefits to be had from deferred a computation whose result may never be needed.

The second is that when a poorly performing algorithm dies, it dies when its value is used, not when the promises are made. This can make it look like the consumer is to blame, when really it’s the producer. Here is a trivial example:

  mysum = foldl (+) 0

main = print (mysum [1..1000000])

Although foldl is tail-recursive, so we aren’t blowing stack through recursive calls, it still blows stack because it builds up a huge, nested structure of promises that only gets evaluated once print is called to render it as a string. That is, the return value from mysum itself is no problem, it’s just a lazy computation against a large list. But then print needs the result, so it asks mysum to fulfill its promise. This in turns causes mysum to churn through the large list of integers, building up the return value as it goes.

However, and here is where the surprise comes in: foldl doesn’t actually compute those values as it walks the input list. No, even these are done lazily, because it can’t know how many of those values will actually be needed. We may know from looking at the code that it will need them all, but it doesn’t know. So it constructs something on the stack looking like this:


And so on, all the way to the last integer. Only when mysum is done constructing promises across the entire input list, and the promise structure is returned, will it actually get evaluated by summing the integers together and finalizing each promise. If you pick a input list large enough, there goes available memory.

The trick here is that the stack fault won’t ocur in foldl, or in mysum. It will occur in print, where the need to resolve the promise result in the call to mysum actually being made, which then calls foldl, which then starts building thunks until memory is gone. In this trivial example there’s very little code or time distance between the problem and its cause, but in real world code there may be enormous gaps between them.

In consequence of this I learned that it’s hard to get GHC to produce stack traces for you when there’s a runtime error. Your code can be going on its merry way, when suddenly there’s a stack fault. But that’s all you see: a stack fault indicating something went wrong. Where did it go wrong? Based on the behavior of the program, I’m led to believe it happened near what the code was actually doing6 – but in fact the problem may have started long, long before, except that laziness differed the trigger to a later time.

So, even though laziness can delay costs and abstract how data is determined, by the same taken it also delays errors and abstracts blame. In C++ if I pass in an I/O stream and there’s a crash reading from it, I know to look at my stream code. But in Haskell if I get a stack fault simply by processing a list, how am I to know what’s wrong? It’s not going to be in the List code, and probably not in the code walking the list, but in code which promised to produce the list potentially a long time ago.

I still think the benefits can outweight the difficulties – especially when it comes to parallelism, and avoiding unnecessary computations, and allowing code to safely traverse infinite series – but it definitely requires a level of algorithmic conciousness on the part of the engineer which seems quite a bit higher than with imperative languages.

  1. And if I do have to include state with this raw, lazy data, but I don’t the algorithm to know anything about it? That’s where the Monad steps in. Say instead of checksumming a file, I’m parsing an expression. There are a lot of details that go along with parsing that have little to do with interpret the next bit of text, such as token position, error context, backtracking information, etc. I want to be able to write a routine that parses a number very simply, without knowing about all those details. It’s the Monad that manages this extra information. You can read more here. ↩

  2. Promises are what get turned into real values when data is finally needed. ↩

  3. If you use profiling libraries along with -prof -auto-all, you can get a much clearer picture of what was executing at the time of the fault. ↩

Syndicated 2009-03-22 08:06:16 (Updated 2009-03-22 08:06:25) from Lost in Technopolis

21 Mar 2009 (updated 21 Mar 2009 at 20:06 UTC) »

Journey into Haskell, part 4

I’ve been reading Read World Haskell now, after having finished the delightful Learn You a Haskell Tutorial. I’m up to chapter 6, about to dive into Typeclasses. In the meantime, I’ve picked a toy project that also has a taste of usefulness: a script to convert the Hackage database into MacPorts Portfiles, respecting inter-package and external library dependencies. I call it HackPorts, of course.


This translation should require two things:

  1. The Cabal package, for read information about all packages known to it. This avoids writing a custom parser, or using HTTP to crawl the online Hackage database.

  2. A mapping file of external dependency names to MacPorts port names. This is for dependencies on things like libbz2, where the script will need to be taught how MacPorts names that library. This is likely to be the most labor-intensive step, having nothing to do with Haskell.

Initial experiences

Haskell makes a concerted point about separating “pure” from “impure” code. Anything which talks to the outside world, such as reading and writing files, is impure. Anything which can be expressed in terms of standard data types – or compositions thereof – is pure.

Take for example a program to count lines in a file. The pure part of the code receives a giant string, splits it into lines at line boundaries, counts those lines, and returns an integer. The impure part takes a command-line argument, interprets it as a FilePath (an impure type, since it must concern itself with operating system-dependent naming conventions), and reads the contents of the file at that location. The program flows by passed the file contents as a string to the pure code, and receiving an integer to be printed on the output device.

This division into pure and impure has an interesting side-effect (no pun intended): Most of a program’s code is written in isolation of its context of usage. Take Cabal, as a case in point here. Part of Cabal deals with downloading information from the Web, reading and writing package files, and executing external commands, like make. But another part of Cabal is concerned only with the structure of package files, and determining the total set of dependencies required for building a package. These latter details can be discussed in complete isolation from what is done with that information.

As a result – and I’m not sure whether the Cabal authors designed it this way or not – Cabal is naturally part “program”, and part API. I was able to start taking apart package files almost instantly, with extremely little code. Here’s a toy program to print out a package’s maintainer, if given the path to a .cabal file:

  import System.Environment (getArgs)

import Distribution.Verbosity (verbose)
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse (readPackageDescription)

main = do
  args <- getArgs
  pkg  <- readPackageDescription verbose (head args)
  print . maintainer . packageDescription $ pkg

Now, I do suppose it’s just as easy to do a similar thing in Python’s distutils, for example:

  import sys

from distutils.extension import *

exts = read_setup_file(sys.argv[1])
print exts[0].language       # print the ext 'language'

What excites me is that Haskell uniquely encourages the separation of alogrithm and application – the isolation of context-dependent knowledge into as small a region of a program as possible.

Too many times I’ve tried to use a utility’s code as a “library”, only to find it was so caught up in its idea of how it should be used, it had never bothered to abstract its core principles into a set of “pure” function, independent from that intent. This happens, for example, with the version control system Git. Although many have wanted a libgit.a for accessing Git’s data structures directly from other languages, yet none exists. One is forced to either shell out to the git command, or write another implementation to interface with the “pure” side of what Git does.

Syndicated 2009-03-21 10:18:23 (Updated 2009-03-21 19:20:43) from Lost in Technopolis

Updated site to use Blueprint CSS again

Recently I changed how the content on this site was generated, from using the standalone OS X application RapidWeaver, to the server-side publishing platform Movable Type. During that transition I changed the site’s style to the minimalist default offered by MT, which uses its own CSS for column layout and typography.

Tonight I finally got around to switching the site back to blueprint-css, which I very much prefer. I used the superb application CSSEdit to help me massage Movable Type’s style into something that compatible with Blueprint’s own typography and layout.

I hope the result is pleasing. If anyone sees strange artifacts or display issues, please let me know. I’m aware code examples were being truncated on the right side before, but this should be corrected now. More on Haskell to come soon!

Syndicated 2009-03-20 09:27:44 (Updated 2009-03-20 09:27:50) from Lost in Technopolis

Journey into Haskell, part 3

Today I need a wrapper script to drop arguments from a command-line. I instinctively reached for bash, but then thought it would be a good exercise for my infant Haskell knowledge.

The task

The task at hand is to write a wrapper script for /usr/bin/ld that drops arguments beginning with -Wl,-rpath,. Since it must deal with arguments containing spaces, and I didn’t want to get into executing external programs with Haskell just yet, I wrappered the wrapper:

$(dirname $0)/ld-wrapper "$@" | xargs -0 /usr/bin/ld

Here ld-wrapper is expected to return its arguments separated by NUL characters so I can feed it to xargs, and from there to /usr/bin/ld. I’m sure there’s an easy, all-in-one way to do this with Haskell, I just haven’t reached that chapter yet.

Haskell version

Anyway, here is the Haskell script:

  import Data.List
import System.Environment

main = do
  args <- getArgs
  putStr $ intercalate "\0"
         $ filter (not . isPrefixOf "-Wl,-rpath") args

Pretty basic: it filters the input arguments, keeping each one which does not begin with the sought-for string, and joins the list together using NUL as the separator.

Ruby version

As a quick sanity check, I wrote the same thing in Ruby, since it has facilities for being just as succinct:

  print ARGV.select {
  |y| !y.include?("-Wl,-rpath")
}.join("\0") + "\0"

I wanted to do this with an “inverse grep” instead of select, but couldn’t find a way to grep for the opposite of a pattern.

What’s interesting is that the Ruby version is marginally faster than the compiled Haskell one. For filtering 40,000 arguments, here are the averaged run-times over 20 invocations:

Language Speed
Haskell 0.00774523019791s
Ruby 0.00551697015762s

My guess is that Haskell is creating 40,000 different strings in memory as it constructs the final result, while Ruby is pasting one together as it goes. I don’t know which.

Syndicated 2009-03-19 05:59:06 (Updated 2009-03-19 08:04:05) from Lost in Technopolis

59 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!