Back from the dead, with a nonlinear parser
Soo, everything went well different than planned. What was supposed to
be a holidy clean-up rewrite of a fun weekend project has turned into a
half-year side project running next to university.
To recap, I initially set out to implement a Markdown[1] parser in
Haskell so I could post formatted text to my Advogato blog. An
email-to-Advogato gateway was quickly whipped up[2]. The first prototype
version of a Markdown parser was also finished within reasonable time[3].
Unfortunately, the code was a mess, so I set out for the rewrite[4]. Much
progress was made but it kept screwing up in certain minor but annoying
cases and the code still looked convoluted. Basically, Parsec just
didn't want to bend in the right direction...
So I replaced Parsec. The module is called
Text.ParserCombinators.Nonlinear[5] because it allows one to slurp in
parts of the document in one part of the parser and reparse them again
later. This allowed me to split up the document according to its
block-level structure and re-assemble, for instance, the text pieces of
quoted or indented lines (without the leading quote marks/indetation)
and run the corresponding parser over the thus extracted subdocument.
Such embedded parses can also work with a completely different token
type than the enclosing parser, a capability which also came in handy.
I recently came across "Frisby"[6], a Haskell implementation of PEG
grammars, which I had never heard of before. The description sounds
cool. I wonder if my Markdown variant could be represented by one? My
parser library is neither optimized for space nor speed, and PEGs sound
compelling in that regard...
Anyway, the implementation based on my nonlinear parsers worked out
really nice wrt. the code structure and doesn't show any of the kinks
that plagued the Parsec version. Since I've deviated somewhat from
Markdown syntax in the places I didn't like, I've dubbed the package
k-tex. I've still got to update the documentation but if anyone is
interested in looking at or even improving the code, you can find it at
http://www.khjk.org/~sm/code/k-tex/.
Best regards,
Sven Moritz
PS. Yep, the Advogato gateway[7] already uses k-tex, and if this post
appears on my blog[8], it's working. ;)
References: