Older blog entries for zw (starting at number 18)

It's been awhile...

Not terribly much progress on the cpplib front. Neil's algorithm didn't work right the first time around, got revised, and now there's a new version sitting on my hard drive waiting to be finished up.

I did some work in other areas, like the 'specs' that tell /bin/cc how to run the real compiler (which is hiding in a dark corner of /usr/lib). These are a little language in their own, and not terribly comprehensible - here's a snippet:

    %1 %{!Q:-quiet} -dumpbase %B %{d*} %{m*} %{a*}
    %{g*} %{O*} %{W*} %{w} %{pedantic*} %{std*} %{ansi}
    %{traditional} %{v:-version} %{pg:-p} %{p} %{f*}
    %{aux-info*} %{Qn:-fno-ident} %{--help:--help}
    %{S:%W{o*}%{!o*:-o %b.s}}

With some magic, that turns into an argument vector for one of the programs run during compilation. Not surprisingly, people avoid the stuff as much as possible.

I also stomped the irritating warning bug with built-in functions. You didn't used to get warned if you forget to include <string.h> and use strcpy, because gcc Knows Things about strcpy before it sees any headers. Not anymore. (It was intended to get this right all along, but one if went the wrong way in the mess that is the Yacc grammar for C.)

Neil came up with an ingenious algorithm for expanding macros, which should get the C standard's semantics just right, but avoid having to scan any token more than once. It's remarkably simple to implement, but difficult to describe and not easy to comprehend from reading the code. There's going to be a long comment explaining it somewhere.

Anyway, I've implemented all of it except stringification, which is presenting some difficulties. I'm a wee bit concerned about the way the algorithm interacts with the macro stack as I designed it - we may be losing critical information. But it's late and I'm tired, and it'll probably all make sense tomorrow.

It's been awhile...

I punted the lexer glue and am busily grinding through a rewrite of the macro expander. The goal here is not to forget about the tokenization of the original macro, but preserve as much of it as possible. This will dramatically reduce the amount of data that has to be copied around, reexamined, etc.

So far, object like macros work, and I'm starting on function like macros. [These are the terms the standard uses. It's like this:

#define foo bar           /* object like macro */
#define neg(a) (-(a))     /* function like macro */

Function like macros take more work, because you have to substitute arguments. In the example above, a might be replaced by a big hairy expression.

I took some time out and stomped on about a hundred compiler warnings. We're now sure that string constants are always treated as, well, constant. I've also got as far as the Valley of the Dead in Nethack, which has never happened before (still only about halfway through the game, though).

I spent two days gluing the new lexer into cpp. Then I typed rm -rf * in the wrong directory. Poof, all my work down the drain.

I think I'll break it up into smaller chunks when I do it over again. If that's possible, which it may not be.

To blizzard: If you're going to improve gmon/mcount, please teach it that if there's an existing gmon.out in the working directory, then it should augment that file instead of clobbering it. That way, if you want to profile a program that runs for a short time, you could just run it a few thousand times in a shell loop. Right now you have to do that, plus rename the reports so they all get saved, and then crunch them together at the end. This takes much longer than it has to, and throws your results off because disk cache is wasted on the huge gmon.out files which all have to stay around until the end.

To make this change safely, you should probably save the identity of the executable in gmon.out, and start over if it changes. (This should be done anyway.)

I'd also like to see better kernelside support for profiling. setitimer(2) has a lot of overhead, and the ticks don't come nearly often enough. SVR4 has a profil(2) system call that pushes the histogram updates into the kernel, which gets rid of the overhead but doesn't help with the granularity. Also, I don't think it can handle gaps in the region to be profiled, so your program has to be statically linked.

I'd rather not add system calls. Instead, I envision a pseudo-device which you map several different times, specifying the window of the address space to profile. It can use the high-resolution timer in the RTC to get ticks more often than the normal timer interrupt. Updates happen in the driver, so no more 30% of execution time spent in __mcount_internal.

GCC/i386 has a stupid bug where it clobbers %edx on every function entry, when compiling with profiling. This breaks -mregparm. Okay, that doesn't affect very many people - it still needs to get fixed.

I have heard two different things about Unicode:

  1. It is the One True Character Set, and the answer to all our problems, or at least the ones having to do with text encoding. Advocates of this position usually have a specific format that they prefer - UTF8 or UCS2.
  2. It is an abomination in the sight of God, and must be stamped out wherever it occurs. The usual reason given is that it's not a strict superset of all existing encodings. E.g. the conversion from various Chinese/Japanese charsets to Unicode and back is said to lose information.

The truth, as usual, will be somewhere in the middle. I don't know enough about the issues to judge. I would appreciate it if anyone who does know enough to judge would contact me and give me some clues. Email: zack@wolery.cumb.org.

Neil Booth came through with the new lexer for cpplib. It's much, much cleaner than the old one, and ~500 lines shorter to boot. Now I just have to knock together a glue layer so it will talk to the rest of the program, which expects a completely different interface. Then I can convert the old code to the new design over the next couple weeks instead of all at once.

The todo list is way out of date. I'd post a link, but it's so outdated it's actively misleading. Updating it goes on this week's queue, right after the glue works.

All the routines in cpp that I haven't gotten around to rewriting (there aren't too many left, thank ghod) look remarkably similar. They are at least five hundred lines long. They have at least ten levels of nested braces. They have at least ten variables that are used all over the entire function, plus at least twenty more declared in the inner blocks. And they have obvious places where they can be broken into smaller functions with ease.

I would like to know who it was wrote all the functions like that. They're not just in the preprocessor. They're everywhere in gcc. I mean, everywhere. And it's not like a monster function gets optimized better than four reasonable ones; in fact, just the opposite. Nor is it easier to debug a monster function, or profile it. And it is certainly not easier to edit it. So someone must have been absolutely in love with the things. And I want to know who, and why.

I have been notified that no, I may not remove support for grotty pre-ANSI macro tricks from the preprocessor. It turns out that the ickiness can be confined in a few small places, which is better than we had it before. But I really wanted it to go away entirely. *grumble*

Several people have pointed me at Netscape's app-defaults file, which lists a bazillion things you can usefully tweak. I now have no splash screen, no blinking text, and no useless toolbar buttons. Unfortunately, there doesn't seem to be anything related to the Energizer-bunny bookmarks headers.

Here's how it's done: add this to .X(resources|defaults) and restart X:

Netscape*toolBar.search.isEnabled: false
Netscape*toolBar.destinations.isEnabled: false
Netscape*toolBar.print.isEnabled: false
Netscape*toolBar.myshopping.isEnabled: false
Netscape*toolBar.viewSecurity.isEnabled: false
Netscape*noAboutSplash: true
Netscape*blinkingEnabled: false

I just love the way the toolbar resource names have nothing to do with their visible labels.

"But the Security button is useful!" I hear you scream. Yeah, but you can get the same thing by clicking on the little lock icon in the left corner of the status bar. I want to keep the status bar, it's actually useful...

There are some other interesting resources in there, like *dontForceWindowStacking which may disable javascript's ability to create popup windows that can't be got rid of. (Too bad it doesn't disable javascript's ability to create popups, period.) I have javascript turned off anyway, so it's irrelevant.

The file to dig through is Netscape.ad, which will probably be in /usr/lib/netscape or wherever the installation directory wound up. It has amusing bitter comments by JWZ.

9 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!