Older blog entries for akihabara (starting at number 2)

Finally got the new expander and lexer live today. A lot of cleanup and optimisation remains to be done, but the immediate priority is comprehensive testsuites so we can be sure not to introduce regressions when improving the code base.

-traditional is not supported fully at present, but we're working on a solution.

At last, the new macro stuff is nearly done, thanks to some work by Zack yesterday. We bootstrap and pass the tests in the testsuite, and are more precise about corner cases than before. Just -traditional stuff to go, and we should be able to apply it to CVS. If you use non-ISO stuff like the GNU ## extension to delete the previous token, or token pasting to get a non-token (remember, we're grown-up and token-based now) you'll get warnings telling you to clean up your act.

A lot of ugliness remains, but that will be easier to clean once we're happy we've got working code and binned the old text-based expander. Many areas are much cleaner, for example the three places (#assert, #unassert and #if/#elif) that need to parse assertions all use the same code now, rather than having their own slightly different version to handle the slight differences of syntax.

The token-based macro expansion process is quite simple in concept, but the reality is a bit messy and hard to understand from the source code. I'll try to clean it up and comment it once we're sure it's working, and have it in CVS.

After -traditional, the next stage is probably to get cpp re-integrated with the front ends, as a library and not a separate process. This will cut out a lot of overhead: an extra exec(), writing out the preprocessed file, the front end reading it in again, and re-tokenising.

Putting the finishing touches on a macro expander that uses the new lexer. Like the lexer, it is token-based. The current lexer and macro expander are both text-based.

Getting this to work has been a very frustrating experience. Macro expansion is a hairy and convoluted process, and stringification and token-pasting just add to the confusion. A dense and strangely-worded C99 specification doesn't help :-)

We just have a single token list, and the lexer lexes all tokens in the next logical line into this list. However, a function-like macro invocation can cross multiple logical source file lines. So we don't write over the original token list, and cause chaos, we append to it instead in this case. However, this appending could cause a realloc of the tokens (stored consecutively in memory), and arguments to macros are stored as lists of pointers to the original tokens (they needn't be consecutive), so they need to be fixed up if we realloc. Other things still to do include fixing bogus line numbers in errors and the final output, and squeezing tokens back into 16 bytes for both 32-bit and 64-bit architectures. We need to run it against a macro abuser like glibc to try and turn up missed obscure cases.

Ah, almost forgot, the gem of -traditional support. Not sure what's best there; I think to get everything right would need a separate pre-pass that does traditional macro text splicing. However, this would lose line and column information and just be a maintenance headache. Probably it's best just to support everything we reasonably can in the token-based environment, and drop the really weird stuff like half-strings and macro expansion within strings.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!