inside full-codegen, v8's baseline compiler
Today's topic is V8's baseline compiler, called "full-codegen" in the source. To recall, V8 has two compilers: one that does a quick-and-dirty job (full-codegen), and one that focuses on hot spots (Crankshaft).
When you work with V8, most of the attention goes to Crankshaft, and rightly so: Crankshaft is how V8 makes code go fast. But everything goes through full-codegen first, so it's still useful to know how full-codegen works, and necessary if you go to hack on V8 itself.
The goal of this article is to recall the full-codegen to mind, so as to refresh our internal "efficiency model" of how V8 runs your code (cf. Norvig's PAIP retrospective, #20), and to place it in a context of other engines and possible models.
I have to admit my surprise however at seeing again how simple full-codegen actually is: it's a stack machine! You don't get much simpler than that. It's shocking in some ways that this is the technology that kicked off the JS performance wars, back in 2008. The article I wrote a couple years ago does mention this, but since then I have spent a lot of time in JSC and forgot how simple things could be.
Full-codegen does have a couple of standard improvements over the basic stack machine model, as I mentioned in the "two compilers" article. One is that the compiler threads a "context" through the compilation process, so that if (say) x++ is processed in "effect" context, it doesn't need to push the initial value of x on the stack as the result value. The other is that if the top-of-stack (ToS) value is cached in a register, so that even if we did need the result of x++, we could get it with a simple mov instruction. Besides these "effect" and "accumulator" (ToS) contexts, there are also contexts for pushing values on the stack, and for processing a value for "control" (as the test of a conditional branch).
and that's it!
That's the whole thing!
What about type recording and all that fancy stuff, you say? Well, most of the fancy stuff is in crankshaft. Type recording happens in the inline caches that aren't really part of full-codegen proper.
Full-codegen does keep some counters on loops and function calls and such that it uses to determine when to tier up to Crankshaft, but that's not very interesting. (It is interesting to note that V8 used to determine what to optimizing using a statistical profiler, and that is no longer the case. It seems statistical profiling would be OK in the long run, but as far as I understand it didn't do a great job with really short-running code, and made it difficult to reproduce bugs.)
We'll start with the last bit. Allocation is a function of your choice of value representation, your use of "hidden classes" ("maps" in V8; all engines do the same thing now), and otherwise it's more a property of the runtime than of the compiler. (Crankshaft can lower allocation cost by using unboxed values and inlining allocations, but that's Crankshaft, not full-codegen.)
Property access is mostly handled by inline caches, which also handle method dispatch and relate to hidden classes.
The only thing left for full-codegen to do is to handle variable accesses: to determine where to store local variables, and how to get at them. In this case, again there's not much magic: either variables are really local and don't need to be looked up by name at runtime, in which case they can be allocated on the stack, or they persist for longer or in more complicated scopes, in which case they go on the heap. Of course if a variable is never referenced, it doesn't need to be allocated at all.
So in summary, full-codegen is quick, and it's dirty. It does its job with the minimum possible effort and gets out of the way, giving Crankshaft a chance to focus on the hot spots.
How does this architecture compare with what other JS engines are doing, you ask?
I have very little experience with SpiderMonkey, but as far as I can tell, with every release they they get closer and closer to V8's model. Earlier this month Kannan Vijayan wrote about SpiderMonkey's new baseline compiler, which looks to be exactly comparable to full-codegen. It's a stack machine that emits native code. It looks a little different because it operates on bytecode and not the AST, as SpiderMonkey still has an interpreter hanging around.
The field is still a bit open as to what is the best approach to use for cold code. It seems that an interpreter could still be a win, because if not, why would SpiderMonkey keep one around, now that they have a baseline compiler? And why would JSC add a new one?
At the same time, it seems to me that one could do more compile-time analysis at little extra cost, to speed up full-codegen by allocating more temporaries into slots; you would push and pop at compile-time to avoid doing it at run-time. It's unclear whether it would be worth it, though, as that analysis would have to run on all code instead of just the hot spots.
Hopefully someone will embark on the experiment; it should just be a simple matter of programming :) Until next time, happy hacking!