A discussion with Zooko about studying compilers brought up some stuff others might find interesting -- so I'm going to try moving it onto my diary here.
You're saying that a realistic compiler would be even *simpler* using concepts of program transformation and partial evaluation? Intriguing! Is there another book that will teach me this stuff?
Not exactly, but there are some sources below. The basic idea is that the job is simplest when you're doing just what needs doing, no more and no less -- and that's taking a program in a high-level language and producing a low-level equivalent. If you're working with a language with clean semantics, the most direct way to get there from here is through a sequence of transformations that each remove some high-level feature, until you end with something that maps directly onto machine code. I can send you a design overview of a compiler I wrote, if you want an example. (It was for the Vapour OS project and got various bits of ugly hair hacked into it by others so I'm not sure I have any really exemplary copy of it around anymore... but there is still the overview doc.) Or there's the toy compiler on my webpage for an earlier go at the same ideas -- underdocumented, though. Someone did write in once to tell me he'd still learned from it. :)
If your source language isn't so nice, you can convert to a nice intermediate form and then continue as before -- SGI's ia64 compiler looked like a good example of this. You can see most compilers as working this way to a greater or lesser degree, but it tends to get hidden by the different data structures used at each stage.
There's a bunch of ways partial evaluation comes in:
- Reusing the same logic in many subtasks: scanner generation, parser generation, codegen generation, etc.
- Conceptual clarity about what you're doing in all the above, even if you don't use a peval in your implementation.
- If you already have a peval for your implementation language, you can use that to go straight from an interpreter to a compiler.
- It applies to more modern stuff like staged code generation.
OTOH it's *not* simpler to write a peval and an interpreter rather than write a simple compiler directly, because the peval effectively has both an interpreter and a codegen crammed together already. You might as well just write the compiler.
Formerly I would've emphasized just the conceptual clarity part, but playing around with implementing pevals I'm starting to think it's not so incredibly hard and it should be in everyone's toolchest.
The relation between compiling by transformations and by peval seems to be basically this: in both, the result is lower-level code that avoids using the full range of constructs available in the original language. In one you write a code transformer that eliminates the construct; in the other you write an interpreter in the language subset that lacks the construct, then compute the generating extension.
So, the refs:
Steele's Rabbit and Clinger's Twobit (that I pointed to before) are based on program transformation. Steele's thesis is pretty pleasant and conversational, though it's obviously far from the state of the art, and got printed by a horrifically ugly early computer typesetter.
Essentials of Programming Languages develops lots of topics through transformations (what's called refactoring these days), and ends with an intro to compiling from that perspective. Only in the first edition, though -- the second seems designed to serve college classes and is just not as fun for hackers. The book is mainly about essential stuff you need to know to start reading programming language research papers, like the title says.
Appel's Compiling With Continuations seems to be the only book-length treatment of a practical compiler written this way, but it's about 10 years old now and the SML/NJ compiler is said to have changed a lot.
On partial evaluation and compiling, I referenced this in my article:
Frank Pagan, Partial Computation and the Construction of Language Processors. Prentice-Hall, 1991. A basic textbook on this approach to compiler design, doing partial evaluation by hand. Examples include parsing, source-to-source translation, code generation, decompiling, and macroexpansion.
It's not brilliantly written, but it's short and worth checking out from the library.
Modern Compiler Design is another compiler textbook. From that page:
A recurrent theme is `precomputation': first a simple, understandable, and obviously correct technique is designed, then all computation that can be done at compiler generation time is performed there. This leads naturally from interpretive lexical analysis to FSAs and allows us to view generated code as an instantiation of an interpreter, thus introducing connections with partial evaluation.
I've skimmed that book and it looks like a good one, but Appel
will probably suffice unless you get really interested in all this.
After writing all this up I might as well post it to my journal, too -- want to be quoted/credited?
One reason the area gets a reputation for deep magic (though it's been losing that over time) is because you can invest arbitrary amounts of effort in optimization for economic reasons -- your compiler's part of your processor as far as SPEC is concerned. I've been discovering lately that partial evaluation isn't as magic as it looks, either... Hm. I read that, while Moore's Law has doubled CPU speeds every 18 months, compiler optimizations have doubled emitted code speed every 18 years. :-)
So it seem like CPU manufacturers would have done better to defund the compiler optimizers and spend the money on manufacturing processes eh?
Excellent question. :) Pulling numbers out of my ass, if you're spending $1B on a chip, throwing $40M into compiler development probably makes sense, even if that only gets you 40% faster than a $2M compiler. (No idea what the real figures are.) From where I'm standing that looks like infinite amounts of money. And yeah, this is why I haven't invested much time in learning hardcore compiler optimization -- I don't want my life's work to be making someone's embedded processor run 10% faster. It's a pity this narrow emphasis still controls how people learn the subject, because the ideas in compiling really are widely applicable -- all kinds of computations are interpreters or translators of some sort, and people tend not to think of them that way. So it's worth more effort to find ways of applying the easy parts of compiler tech in lots of real programs. Valgrind is a great recent example, though perhaps `easy' isn't quite the right word there -- but it's certainly not like a hardcore dynamic optimizer.
I also wish more of that money went into implementing more powerful languages with competitive performance. And supporting more physically realistic models of computation -- that slow progress in software is actually holding the hardware back. What to do about it, I don't know -- it's the usual economic lockin.