Recent blog entries for sengan

Been a while since I last wrote anything up. So what happened?

My PhD was accepted in June... and I received the certificate last week.

I bought a house up at 8000 feet in Colorado. On Xmas it was foggy but today's superb albeit a bit bright with snow. Spent most of Xmas assembling a table saw, and reading a really interesting book about "The American who taught the Japanese about Quality", a Dr. Demming... I wrote a review of it, I'll either put on /. or here.

I'm much further with the binary parser I described back in early April. It's now 2000 lines of Haskell going through lexing, parsing, desugaring, finding unknown/redefined symbols, finding recursively defined symbols, finding symbol lengths, finding equalities (the grammar allows bits 3:5 of symbol A to be equal to bits 7:9 of symbol B), determining the constants in the grammar, determining types and being able to differentiate different subtypes depending on the bits that are set, and then interacting with the user/debugger. At the moment I'm rewriting the back end (interacting with the user). I'm writing a paper on it so when I'm done I'll post a link to it here.

I coded a library in C to interact with the haskell parser as a seperate process. To allow massively large structures like caches to be parsed, Binary parser is able to deal with arbitrarily large numbers. Coding that in Haskell was easy. But allowing the same flexibility in C was a real pain. realloc, realloc, realloc... I'm not surprised that buffer overflows are so common when it's so painful to code array manipulation correctly.

Started coding a C parser (which will hopefully lead on to a C++ parser) in Haskell. I'm using the "happy" parser generator, which does some nifty things. I was surprised that generating a list of tokens is much slower than using a continuation based lexer, but that could explain my binary parser's memory profile. The parser will hopefully lead to some helpful tools...

Bought and started reading Martin Fowler's Refactoring book. I seem to use lots of his tricks already, but some of them like the "null object" are new. Tool support would certainly help, since one sometimes breaks things when refactoring designs which have no tests. Like I did last Friday :-(

Spent 3 hours on tax forms. Mmm, lovely US tax system. In the process found out there are tons of charities where I live (Boulder, CO).

Other than that read an Anarchism Triumphant that kind of heralds the recent court decision that source code is speech: we write code in higher level languages to help others understand it. Otherwise we could code in hex (like I did on the Z80 before I had a stable assembler). Since it is the expression to other human-beings that is copyrightable, then is a binary string which controls a machines' behaviour copyrightable? And if it is, is it the full string or some arbitrary percentage of it: am I violating copyright if I come up with a different string which performs the same function (eg containing a different copyright string)? It can't be the behaviour executed by the machine, since I would have thought that's the domain of patent law. But if a binary string is not copyrightable then perhaps the only way to distribute software while maintaining copyright should be to distribute the source code. Provocative!

Found caolan's article interesting in that I'm currently coding a binary parser for work (unfortunately not opensource currently). Its target app is the ICE front-end we use to debug x86 chips. x86's have tons of registers & tables in odd-ball formats such as the GDT. Writing or maintaining code to display & edit them is obviously a waste of time. So instead I specify a grammar corresponding to the format of the registers or memory areas. So I now have some 700 lines of Haskell code that parse the ascii grammar file & desugar it. I have yet to code the bitparser itself. However, unlike what caolan wants, it does not need to cope with pointers or structures whose length is specified in the bitstream being bitparsed -- hardware doesn't do that often enough to warrant the extra complexity.

Other than that, played with glade which looks like it could save me some time. Also got Simon Marlow's happy haskell parser to build with ghc 4.06. It works on my haskell code except for the period in "forall a . " in existential type declarations. Finally downloaded the Aqua gtk theme, which I really like. :-).

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!