Iteratees at Tsuru Capital
Tsuru Capital is a small company. We build our internal systems for live trading and offline analysis in Haskell, and we're proud to be sponsoring ICFP 2011. We use iteratees throughout our systems, and have actively encouraged all our staff to contribute changes upstream and participate in community design discussions. By being part of the open source community and taking part in peer-review, we all end up with better software.
Over time various Tsuru staff members have worked on tools using iteratees, including (grepping the CONTRIBUTORS files): Bryan Buecking, Michael Baikov, Elliott Pace, Conrad Parker, Akio Takano, and Maciej Wos. There's been some lively discussions and many small patches providing functions that we use in production every day.
Last year Conal Elliott provided some mentoring to Tsuru staff, during which we worked through a denotational semantics for iteratees. This resulted in discussions on both the iteratee project list and haskell-cafe about Semantics of iteratees, enumerators, enumeratees.
By using iteratees in production we've contributed various simple but practical functions, including:
- enumFdFollow, an enumerator (data source) which allows you to process the growing tail of a log file as it is being written.
- ioIter, an iteratee that uses an IO action to determine what to do. Typically this is action involves some user interaction, such as a user issuing commands like play/pause/next/prev.
- ListLike functions last (an iteratee that efficiently returns the last element of a stream), mapM_ and foldM.
- mapChunksM_, a more efficient version of mapM_ that operates on the underlying chunks, eg. logger = mapChunksM_ (liftIO . print).
- takeWhile, and its enumeratee variant takeWhileE
- endianRead8, an iteratee for reading 64bit values with a given endianness. I've used this in ght as well as an internal project.
- convStateStream, which converts one stream into another while continually updating an internal state. Importantly for variable bitrate binary data, it can produce elements of the output stream from data that spans stream chunks.
- (>) and (. These allow stream converters to be composed without rewriting boilerplate. Jon Lato gives a good example using these in the StackOverflow answer to Attoparsec Iteratee.
- zip, zip, sequence_ for using multiple iteratees to process a single stream instance, and (for zip*) collecting the results.
- eneeCheckIfDone*: This family of functions (eneeCheckIfDoneHandle, eneeCheckIfDonePass, eneeCheckIfDoneIgnore) can be used with
unfoldConvStreamCheck to make a version of unfoldConvStream which respects seek messages.
Parallel stream processing We often want to do multiple unrelated analysis tasks on a data stream. Whereas sequence_ takes a list of iteratees to run simultaneously and handles each input chunk by mapM across that list, psequence_ runs each input iteratee in a separate forkIO thread. For a real-world example, see Michael Baikov's post about psequence, psequence_, parE, parI.
Thanks to John Lato for consistently and reliably maintaining the iteratee package, providing thoughtful feedback and graciously suggesting improvements.