18 Apr 2013 wingo   » (Master)

inside full-codegen, v8's baseline compiler

Greetings to all. This is another nargish article on the internals of the V8 JavaScript engine. If that's your thing, read on. Otherwise, here's an interesting interview with David Harvey. See you laters!

full-codegen

Today's topic is V8's baseline compiler, called "full-codegen" in the source. To recall, V8 has two compilers: one that does a quick-and-dirty job (full-codegen), and one that focuses on hot spots (Crankshaft).

When you work with V8, most of the attention goes to Crankshaft, and rightly so: Crankshaft is how V8 makes code go fast. But everything goes through full-codegen first, so it's still useful to know how full-codegen works, and necessary if you go to hack on V8 itself.

The goal of this article is to recall the full-codegen to mind, so as to refresh our internal "efficiency model" of how V8 runs your code (cf. Norvig's PAIP retrospective, #20), and to place it in a context of other engines and possible models.

stack machine

All JavaScript code executed by V8 goes through full-codegen, so full-codegen has to be very fast. This means that it doesn't have time to do very much optimization. Even if it did have time to do optimization (which it doesn't), full-codegen lacks the means to do so. Since the code hasn't run yet when full-codegen compiles it, it doesn't know anything about types or shapes of objects. And as in other JS engines, full-codegen only compiles a function at a time, so it can't see very far. (This lets it compile functions lazily, the first time they are executed. Overall syntactic validity is ensured by a quick "pre-parse" over the source.)

I have to admit my surprise however at seeing again how simple full-codegen actually is: it's a stack machine! You don't get much simpler than that. It's shocking in some ways that this is the technology that kicked off the JS performance wars, back in 2008. The article I wrote a couple years ago does mention this, but since then I have spent a lot of time in JSC and forgot how simple things could be.

Full-codegen does have a couple of standard improvements over the basic stack machine model, as I mentioned in the "two compilers" article. One is that the compiler threads a "context" through the compilation process, so that if (say) x++ is processed in "effect" context, it doesn't need to push the initial value of x on the stack as the result value. The other is that if the top-of-stack (ToS) value is cached in a register, so that even if we did need the result of x++, we could get it with a simple mov instruction. Besides these "effect" and "accumulator" (ToS) contexts, there are also contexts for pushing values on the stack, and for processing a value for "control" (as the test of a conditional branch).

and that's it!

That's the whole thing!

Still here?

What about type recording and all that fancy stuff, you say? Well, most of the fancy stuff is in crankshaft. Type recording happens in the inline caches that aren't really part of full-codegen proper.

Full-codegen does keep some counters on loops and function calls and such that it uses to determine when to tier up to Crankshaft, but that's not very interesting. (It is interesting to note that V8 used to determine what to optimizing using a statistical profiler, and that is no longer the case. It seems statistical profiling would be OK in the long run, but as far as I understand it didn't do a great job with really short-running code, and made it difficult to reproduce bugs.)

I think -- and here I'm just giving my impression, but fortunately I have readers that will call my bullshit -- I think that if you broke down basic JavaScript and ignored optimizable hot spots, an engine mostly does variable accesses, property accesses, and allocations. Those are the fundamental things to get right. In a browser you also have DOM operations, which Blink will speed up at some point; and there are some domain-specific things like string operations and regular expressions of course. But ignoring that, there are just variables, properties, and allocations.

We'll start with the last bit. Allocation is a function of your choice of value representation, your use of "hidden classes" ("maps" in V8; all engines do the same thing now), and otherwise it's more a property of the runtime than of the compiler. (Crankshaft can lower allocation cost by using unboxed values and inlining allocations, but that's Crankshaft, not full-codegen.)

Property access is mostly handled by inline caches, which also handle method dispatch and relate to hidden classes.

The only thing left for full-codegen to do is to handle variable accesses: to determine where to store local variables, and how to get at them. In this case, again there's not much magic: either variables are really local and don't need to be looked up by name at runtime, in which case they can be allocated on the stack, or they persist for longer or in more complicated scopes, in which case they go on the heap. Of course if a variable is never referenced, it doesn't need to be allocated at all.

So in summary, full-codegen is quick, and it's dirty. It does its job with the minimum possible effort and gets out of the way, giving Crankshaft a chance to focus on the hot spots.

comparison

How does this architecture compare with what other JS engines are doing, you ask?

I have very little experience with SpiderMonkey, but as far as I can tell, with every release they they get closer and closer to V8's model. Earlier this month Kannan Vijayan wrote about SpiderMonkey's new baseline compiler, which looks to be exactly comparable to full-codegen. It's a stack machine that emits native code. It looks a little different because it operates on bytecode and not the AST, as SpiderMonkey still has an interpreter hanging around.

JavaScriptCore, the WebKit JavaScript engine, is similar on the surface: it also has a baseline compiler and an optimizing compiler. (In fact it's getting another optimizing compiler soon, but that's a story for another article). But if you look deeper, I think JSC has diverged from V8's model in interesting ways.

Besides being a register machine, a model that has inherent advantages over a stack machine, JavaScriptCore also has a low-level interpreter that uses the same calling convention as the optimizing compiler. I don't think JSC is moving in V8's direction in this regard; in fact, they could remove their baseline compiler entirely, relying instead on the interpreter and the optimizing compiler. JSC also has some neat tricks related to scoping (lazy tear-off); again, a topic for some future article.

summary

The field is still a bit open as to what is the best approach to use for cold code. It seems that an interpreter could still be a win, because if not, why would SpiderMonkey keep one around, now that they have a baseline compiler? And why would JSC add a new one?

At the same time, it seems to me that one could do more compile-time analysis at little extra cost, to speed up full-codegen by allocating more temporaries into slots; you would push and pop at compile-time to avoid doing it at run-time. It's unclear whether it would be worth it, though, as that analysis would have to run on all code instead of just the hot spots.

Hopefully someone will embark on the experiment; it should just be a simple matter of programming :) Until next time, happy hacking!

Syndicated 2013-04-18 15:45:20 from wingolog

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!