Older blog entries for pesco (starting at number 13)

1 Jun 2006 (updated 1 Jun 2006 at 10:23 UTC) »
Representing marked-up text in Haskell

I wrote previously about my plan to use Markdown as the input format for advopost. I decided against re-using the existing Markdown-to-HTML converter, because I would have to strip the resulting output down to the Advogato subset of HTML in postprocessing; feels too clutchy. So I'm going to implement a parser for (a variant of) Markdown that reads the input into a structured Haskell data type 'Doc'. Here is my current design for that type:

   module Doc where
   data Doc     =  Doc         String      -- title
                               [Para]      -- body
                               [Doc]       -- subsections 
   data Para    =  Paragraph   String      -- paragraph title
                               [Block]     -- paragraph body
   data Block   =  Blockquote  Doc
                |  Bulleted    [[Para]]    -- unordered list
                |  Numbered    [[Para]]    -- numbered list
                |  Codeblock   [[Inline]]  -- list of lines, ignore Codespans
                |  Line        [Inline]
   data Inline  =  Str         String      -- ignore linebreaks
                |  Codespan    [Inline]
                |  Emph        [Inline]
                |  Link        [Inline]    -- link text
                               String      -- link target
                               String      -- link title
                |  Image       [Inline]    -- fallback alternative for this image
                               String      -- image location
                               String      -- image title

I want both the input format and the Haskell data structure to be independent of the output format being HTML. Therefore I'm not going to support inline-HTML in the input. I also want structural markup (as opposed to presentational), so I left out horizontal rules and forced linebreaks. Lastly, I've never heard of using a "strong emphasis" (as opposed to normal emphasis) in typesetting, so I dropped that as well.

I've tried to design the above types in such a way as to minimize the possibility of forming non-sensical or ambiguous documents. That's why there is such deep nesting of different types instead of just one big algebraic data type with constructors for concatenation, paragraph and section breaks, etc.. Comments welcome.

I hope that the 'Doc' type will be useful in further coding. For example, it would be really cool to have a fancy combinator library for 'Doc's along with a pretty-printer to turn them back into plaintext: Then we could use them for general pretty output from Haskell programs. While there are several existing pretty-printing libraries, to my knowledge none of them use structural markup and they are all targeted at console output only.

31 May 2006 (updated 31 May 2006 at 20:02 UTC) »

Posting to Advogato via email

If this messages reaches my Advo diary, my new mail-to-advo gateway program is not only working but also correctly embedded into my mail setup via procmail. Thus, albeit being undocumented, underfeatured, and unpolished, the program is quite functional. Lest I forget it myself, here are the quick set-up instructions:

Step 1: Dependencies.

The program is written in Haskell and has been tested with GHC 6.4.1. It requires a number of Haskell packages, namely: MissingH (0.13.1), HaXml (1.13), NewBinary (2005-05-30), Crypto (3.0.3), HaXml (1.13), http (2005-04-23), haxr (2006-05-30, custom-patched)

I've already submitted my patches for haxr (the XML-RPC library) to its author, Björn Bringert. Hopefully, he'll accept them soon. In short, the particular problem was that Advogato returns <value> elements with whitespace (i.e. character data) around the actual data element. I had to modify a function called fromXRValue to filter them out -- /unless/ the whole <value> consists of /only/ character data, which is supposed to be interpreted equivalently to a <string> data element. Duh. (Rant: This format is supposed to be processed by /machines/ for fsck's sake!)

Step 2: Fetch and build.

There's just a single source file, http://www.khjk.org/~sm/code/advopost/advopost.lhs. Fetch it and build with the straight-forward command 'ghc -o advopost --make advopost.lhs'. Make sure you've got all the above dependencies visible (I had to manually 'ghc-pkg expose' HaXml, I think).

Step 3: Procmailrc.

Choose an email account to direct your postings to. I'm posting to a mailing list and the posts loop back to my own account. There, the following procmail recipe passes any posts from me to that list through advopost:

:0 c
* ^TO_the@mailing.list
* ^From:.*my@email.addr
* ! ^In-Reply-To:
| /path/to/advopost my-advo-name my-advo-pass

Put this recipe at the beginning of your respective .procmailrc; the 'c' flag in the first line means "copy", i.e. a copy of the mail is passed to advopost and the original continues through the rest of .procmailrc. The third condition line makes this rule apply only to messages that don't carry an "In-Reply-To" header, i.e. only fresh posts are passed to Advogato, not my replies to other people's messages.

Enjoy.

PS. Er, naturally, advopost is open-source software. You may peruse the published version under the terms of the common 3-clause BSD license.

PPS. I know, this post is missing lots of markup. While advopost would pass it all through untouched, I don't want to burden the mailing list I'm cross-posting to with lots of HTML tags. My plan is to finally interpret the input to advopost as Markdown, cf. http://daringfireball.net/projects/markdown/.

Update. Aha, of course, non-ASCII characters come through wrong. I'm pretty sure I'm transmitting them as UTF-8, or at least so I had assumed. Will have to investigate...

Update 2. Yay, Björn has already accepted my patches! Advopost will now work with the latest haxr from darcs (2006-05-31).

30 May 2006 (updated 30 May 2006 at 09:45 UTC) »

Posting to Advogato via email

Sooo, I want to post to my diary by email. If you read this, the program is working. :)

Update. Oh, of course you want to see the code! There it is, in all its unpolished glory. Note that about 90% of it is concerned with working around bugs in the crappy email parser. ;-/

I've been toying around with the idea of creating a structural markup language (like XML) with LaTeX-like syntax and a simple semantics. My idea is to let what are macros in (La)TeX become functions. Then the available functions and their types form the analogue to, say, XML Schemas. Giving a concrete implementation for the functions provides an interpretation to the document, for example transforming it into some other format (like XSLT). I wonder if people think this is a good idea; or would even like to help with its development. I have produced an initial scratch implementation of a parser in Haskell. There is also a function to transform the parsed document into a Haskell expression.

As a next step, I should add a little main routine for reading in a document and spitting out a simple Haskell module which

  • imports some given module as an interpretation and
  • exports the source document under that interpretation (through the conversion to an expression, as described).
The obvious proof-of-concept toy, then, is of course an Advogato-exporter.
Another amazing discovery!

Have you ever wondered what it's like in hyperspace? I remember that Space Gothic explained the experience of "looking out the window in hyperspace" as instantly turning everyone insane, which I quite liked. Then, according to Event Horizon, as many will remember, there's just Hell in hyperspace, an also somewhat enjoyable theory. However, as anyone who was paying attention in the early nineties will instantly confirm, in the modern world of today, we know that there is disco music in hyperspace!

The scientific device by which the above revelation was established is called Star Control II and nowadays it is freely available for every skeptic to confirm this earth-shaking discovery.

15 Dec 2005 (updated 15 Dec 2005 at 14:14 UTC) »

Wow, I just found bitlbee! Now I can finally get rid of Gaim. For those who don't know, bitlbee is a Jabber,ICQ,MSN,etc. to IRC gateway. It runs as an IRC server on localhost, maintaining a channel which all your buddies from the different IM networks just join when they come on-line. So nice! Then you can either /msg them or just talk to them in the customary IRC way ("nick: blah") from the control channel.

Oh by the way, setup is really simple, just apt-get install bitlbee, /connect localhost, and follow the instructions of the bee.

I did nothing today. Except run Windows Update, read mail, and jerk off (just kidding of course (of course)). Shit day. But wait, I cooked! This is an accomplishment. And it tasted pretty good. Amazes me every time.

Well, okay, I'm also thinking about ways to extend Haddock to better support my prefered style of writing code. I write literate programs and obviously I don't want the interface documentation comments to show up as part of the code in the typeset output. Other comments, however, can be very useful, so I don't want to throw out "non-literate" comments alltogether. So far, I have resorted to typesetting seperate reference manuals in roff (sic!), which has actually worked quite well -- and I love the feeling of typing man haskellfunction.

So I have two options: Either write something to strip doc comments out of the code before typesetting or invent some way to add them to the literate comments. I have been trying to investigate the latter way, because it appears "cleaner" to my belly-brain. This way could also make it easy to put the reference docs in some place slightly different from the usual "right next to the thing". Of course completely seperating the two goes directly against the idea of putting reference docs next to code, but well. I haven't thought this through yet, but must wash my dishes now. More thoughts later.

Wah yeah, I finished the paper! But I really should start this stuff early enough next time. I got a bad stress-headache last night, which luckily vanished as soon as the hard work was done. :) I was afraid to not get my paper into the conference proceedings because I'm two days late, but luckily the CCC people are reachable via IRC, very friendly and helpful, and best of all, not overly picky about deadlines. ;)

So anyway, as you can see, I've put the paper on the web before the actual lecture; surely won't hurt. If the "logical language" Lojban is of any interest to you, please have a look and tell me what you think! I'm a tiny bit afraid still to ask for opinions on #lojban, because I'm actually quite a newbie at the language and have had to submit the paper without review of an expert due to the time constraint. But actually I'm kind of confident I haven't made any serious mistakes, so I'll post the link to #lojban in a minute...

Also of course, if you have any way/interest to attend the lecture (or any of the other great ones) at the 22nd. Chaos Communication Congress (Berlin, Dec. 27-30), please come! I'm confident it will be an exciting four days.

29 Nov 2005 (updated 29 Nov 2005 at 20:57 UTC) »

Yay, I retrieved my passwort. :)

During my absence, I've written a bunch of useful Haskell stuff (to be found at my website). But most importantly I've co-founded the KHJK, basically a society for mad scientists and futurist engineers (*g*). I hope to shell out some great stuff for/with this organization.

My current task is to develop a metadata-centric filesystem that should be able to serve both as a long-term archival facility and a backend system for an automated website, i.e. an istant publishing tool. Incidentally, it should also be suitable for email message posting in the style of IM2000. Oh, and for blogging. ;)

I think I would like a predicate logic metadata-base, and a query language like Prolog. I think I should define some common predicates ("title", "created", "modified", etc.), rules (reflexivity, transitivity, etc.) and some kind of textual representation for these relations first. Then I will think about the best way to query this system from Haskell. Via an interpreter, this will also yield an immediate command-line interface to the database. I've been wishing for a Haskell command shell environment for some time. Maybe this can be combined. All the basic functions are in the standard library, one just has to build a specific Prelude that imports them all, possibly with shorter or more mnemonic names.

But for now, I need to prepare my lecture at this year's Chaos Communication Congress, which is going to be about Lojban, another topic which I am pretty excited about. For those who don't know, it's a constructed (spoken!) language based on predicate logic. Everything is very clear an clean. I like! :)

Interesting. I'm done with the (graph-theoretical) shortest path algorithm. And by the way, guess what, it's an instance of a breadth-first fold (no shit). That function almost twisted my brain out of my head, though. Fuckin' thing. But judging by how convoluted it started, it turned out pretty nice. 14 siginficant lines of code. OK?

Annotating each node in the graph with its shortest path to a given root node is done in three significant lines of code. Oh, and by the way, of course I ended up using (basically) the same data structure as FGL. Well, at least now I know.

As I was saying... Interesting. My distance to raph is four for both Apprentice and Journeyer level. So the maxflow algorithm is next...

Oh, and thanks Akira, for certifying me. :)

4 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!