Older blog entries for demoncrat (starting at number 7)

A couple of new releases tonight:

  • idel now has a demo app contributed by lukeg: Roshambo bot wars. We had some fun writing dumb little bots for it and fighting it out on IRC.
  • ikiwiki is a prototype webserver and Wiki in C. There are two versions, with the more interesting but less complete one using cords and garbage collection instead of the usual C strings and manual memory management/copying. The example wiki is for discussion/editing of the example wiki's source code -- like a Favorite Toy Language whose only use is to write its own compiler.

A discussion with Zooko about studying compilers brought up some stuff others might find interesting -- so I'm going to try moving it onto my diary here.

Zooko wrote:

You're saying that a realistic compiler would be even *simpler* using concepts of program transformation and partial evaluation? Intriguing! Is there another book that will teach me this stuff?

Not exactly, but there are some sources below. The basic idea is that the job is simplest when you're doing just what needs doing, no more and no less -- and that's taking a program in a high-level language and producing a low-level equivalent. If you're working with a language with clean semantics, the most direct way to get there from here is through a sequence of transformations that each remove some high-level feature, until you end with something that maps directly onto machine code. I can send you a design overview of a compiler I wrote, if you want an example. (It was for the Vapour OS project and got various bits of ugly hair hacked into it by others so I'm not sure I have any really exemplary copy of it around anymore... but there is still the overview doc.) Or there's the toy compiler on my webpage for an earlier go at the same ideas -- underdocumented, though. Someone did write in once to tell me he'd still learned from it. :)

If your source language isn't so nice, you can convert to a nice intermediate form and then continue as before -- SGI's ia64 compiler looked like a good example of this. You can see most compilers as working this way to a greater or lesser degree, but it tends to get hidden by the different data structures used at each stage.

There's a bunch of ways partial evaluation comes in:

  • Reusing the same logic in many subtasks: scanner generation, parser generation, codegen generation, etc.
  • Conceptual clarity about what you're doing in all the above, even if you don't use a peval in your implementation.
  • If you already have a peval for your implementation language, you can use that to go straight from an interpreter to a compiler.
  • It applies to more modern stuff like staged code generation.

OTOH it's *not* simpler to write a peval and an interpreter rather than write a simple compiler directly, because the peval effectively has both an interpreter and a codegen crammed together already. You might as well just write the compiler.

Formerly I would've emphasized just the conceptual clarity part, but playing around with implementing pevals I'm starting to think it's not so incredibly hard and it should be in everyone's toolchest.

The relation between compiling by transformations and by peval seems to be basically this: in both, the result is lower-level code that avoids using the full range of constructs available in the original language. In one you write a code transformer that eliminates the construct; in the other you write an interpreter in the language subset that lacks the construct, then compute the generating extension.

So, the refs:

Steele's Rabbit and Clinger's Twobit (that I pointed to before) are based on program transformation. Steele's thesis is pretty pleasant and conversational, though it's obviously far from the state of the art, and got printed by a horrifically ugly early computer typesetter.

Essentials of Programming Languages develops lots of topics through transformations (what's called refactoring these days), and ends with an intro to compiling from that perspective. Only in the first edition, though -- the second seems designed to serve college classes and is just not as fun for hackers. The book is mainly about essential stuff you need to know to start reading programming language research papers, like the title says.

Appel's Compiling With Continuations seems to be the only book-length treatment of a practical compiler written this way, but it's about 10 years old now and the SML/NJ compiler is said to have changed a lot.

On partial evaluation and compiling, I referenced this in my article:

Frank Pagan, Partial Computation and the Construction of Language Processors. Prentice-Hall, 1991. A basic textbook on this approach to compiler design, doing partial evaluation by hand. Examples include parsing, source-to-source translation, code generation, decompiling, and macroexpansion.

It's not brilliantly written, but it's short and worth checking out from the library.

Modern Compiler Design is another compiler textbook. From that page:

A recurrent theme is `precomputation': first a simple, understandable, and obviously correct technique is designed, then all computation that can be done at compiler generation time is performed there. This leads naturally from interpretive lexical analysis to FSAs and allows us to view generated code as an instantiation of an interpreter, thus introducing connections with partial evaluation.
I've skimmed that book and it looks like a good one, but Appel will probably suffice unless you get really interested in all this.

After writing all this up I might as well post it to my journal, too -- want to be quoted/credited?

One reason the area gets a reputation for deep magic (though it's been losing that over time) is because you can invest arbitrary amounts of effort in optimization for economic reasons -- your compiler's part of your processor as far as SPEC is concerned. I've been discovering lately that partial evaluation isn't as magic as it looks, either...
Hm. I read that, while Moore's Law has doubled CPU speeds every 18 months, compiler optimizations have doubled emitted code speed every 18 years. :-)

So it seem like CPU manufacturers would have done better to defund the compiler optimizers and spend the money on manufacturing processes eh?

Excellent question. :) Pulling numbers out of my ass, if you're spending $1B on a chip, throwing $40M into compiler development probably makes sense, even if that only gets you 40% faster than a $2M compiler. (No idea what the real figures are.) From where I'm standing that looks like infinite amounts of money. And yeah, this is why I haven't invested much time in learning hardcore compiler optimization -- I don't want my life's work to be making someone's embedded processor run 10% faster. It's a pity this narrow emphasis still controls how people learn the subject, because the ideas in compiling really are widely applicable -- all kinds of computations are interpreters or translators of some sort, and people tend not to think of them that way. So it's worth more effort to find ways of applying the easy parts of compiler tech in lots of real programs. Valgrind is a great recent example, though perhaps `easy' isn't quite the right word there -- but it's certainly not like a hardcore dynamic optimizer.

I also wish more of that money went into implementing more powerful languages with competitive performance. And supporting more physically realistic models of computation -- that slow progress in software is actually holding the hardware back. What to do about it, I don't know -- it's the usual economic lockin.

More CS/programming books up for auction, including my old Smalltalk-80 books -- I'll be terribly sorry to see them go. Cashflow problems.

lukeg: I still print out my code to go over with pen in hand sometimes... got in the habit back in the dark ages when I had only occasional access to computers. It seems most helpful in the early stages of a program -- later on I may still read through a program from start to finish, sitting at the computer, but that's more of a way of procrastinating by picking little nits than anything productive.

With all the arguments about virtual machines lately, people might like to check out idel. It picks out a different point in the design space from stuff like the JVM -- closer to a hardware instruction set, using software fault isolation for safety.

I have about 20 books up for auction on ebay, the majority on programming or computer science. Will be putting up more soon.

My E compiler has gone through its big reorg and is passing all its tests again. I'd like to get started on the JVM backend soon, but with all the time going into finding work and scraping together rent money, it's iffy.

Headhunter follies

A headhunter emailed me about a job opening, which was nice, as I'm looking for work, except that the description looked familiar. Sure enough, poking around a local company's website turned up the exact same job ad which I'd seen already, so I replied with a one-liner saying thanks but I'd go straight to the company, and figured that was the end of it.

They replied:

I have been contracted by Company to recruit for this position. I respect that you are intelligent enough to determine the client, I would also hope that you would respect the way we have chosen to work. If you are interested in the position, Please send me an MSWord version of you resume so that I may determine the fit per my clients request.

(In their original email they said they'd seen my resume already on hotjobs.com (which I hadn't posted it to, at least not directly).) I wonder, is all this silliness old hat to other jobseekers?

5 Oct 2001 (updated 5 Oct 2001 at 17:51 UTC) »

Prompted by raph's post:

Bad patents

I posted about this to livejournal once: Sun patented a three-line assembly program that had already appeared in a famous paper in the previous decade. There needs to be a wiki or something for collecting these cases; I helped start one up but its server is down now.

Readability of RPN

I agree that having to track through the effects of stack manipulations hurts Forth's readability. That's why when I made up a variant of Forth for idel it had no stack-manipulation words: you use local variables for everything instead. The result seems more readable than assembly code, the more obvious language type that could've been used there.

If indeed workers do serve a large colony in order to win prestige and acquire rank, there must be a mechanism that demonstrates each individual's ability reliably.

The mechanism that seems likeliest to us is pheromones. ...It is known that the pheromone secreted by the queen motivates workers to serve her. Bees are very eager for the pheromone and lick it off the body of the queen. Workers can get the pheromone directly from the body of the queen when they serve her, or indirectly, from workers who have served the queen.

from The Handicap Principle by Zahavi and Zahavi.

So, does this sound like advogato? Is it a coincidence?

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!