Older blog entries for wingo (starting at number 383)

eval, that spectral hound

Friends, I am not a free man. Eval has been my companion of late, a hellhound on my hack-trail. I give you two instances.

the howl of the-environment, across the ages

As legend has it, in the olden days, Aubrey Jaffer, the duke of SCM, introduced low-level FEXPR-like macros into his Scheme implementation. These allowed users to capture the lexical environment:

(define the-environment
   (lambda (exp env)

Tom Lord inherited this cursed bequest from Jaffer, when he established himself in the nearby earldom of Guile. It so affected him that he added local-eval to Guile, allowing the user to evaluate an expression within a captured local environment:

(define env (let ((x 10)) (the-environment)))
(local-eval 'x env)
=> 10
(local-eval '(set! x 42) env)
(local-eval 'x env)
=> 42

Since then, the tenants of the earldom of Guile have been haunted by this strange leakage of the state of the interpreter into the semantics of Guile.

When the Guile co-maintainer title devolved upon me, I had a plan to vanquish the hound: to compile Guile into fast bytecode. There would be no inefficient association-lists of bindings at run-time. Indeed, there would be no "environment object" to capture. I succeeded, and with Guile 2.0, local-eval, procedure->syntax and the-environment were no more.

But no. As Guile releases started to make it into distributions, and users started to update their code, there arose such a howling on the mailing lists as set my hair on end. The ghost of local-eval was calling: it would not be laid to rest.

I resisted fate, for as long as I could do so in good conscience. In the end, Guile hacker Mark Weaver led an expedition to the mailing list moor, and came back with a plan.

Mark's plan was to have the syntax expander recognize the-environment, and residualize a form that would capture the identities of all lexical bindings. Like this:

(let ((x 10)) (the-environment))
(let ((x 10))
   ;; Procedure to wrap captured environment around
   ;; an expression
   ;; Captured variables: only "x" in this case
   (list (capture x))))

I'm taking it a little slow because hey, this is some tricky macrology. Let's look at (capture x) first. How do you capture a variable? In Scheme, with a closure. Like this:

;; Capture a variable with a closure.
(define-syntax-rule (capture var)
    ;; When called with no arguments, return the value
    ;; of VAR.
    (() var)
    ;; When called with one argument, set the VAR to the
    ;; new value.
    ((new-val) (set! var new-val))))

The trickier part is reinstating the environment, so that x in a local-eval'd expression results in the invocation of a closure. Identifier syntax to the rescue:

;; The wrapper from above: a procedure that wraps
;; an expression in a lexical environment containing x.
(lambda (exp)
  #`(lambda (x*) ; x* is a fresh temporary var
      (let-syntax ((x (identifier-syntax
                        (_ (x*))
                        ((set! _ val) (x* val)))))

By now it's clear what local-eval does: it wraps an expression, using the wrapper procedure from the environment object, evaluates that expression, then calls the resulting procedure with the case-lambda closures that captured the lexical variable.

So it's a bit intricate and nasty in some places, but hey, it finally tames the ghostly hound with modern Scheme. We were able to build local-eval on top of Guile's procedural macros, once a couple of accessors were added to our expander to return the set of bound identifiers visible in an expression, and to query whether those bindings were regular lexicals, or macros, or pattern variables, or whatever.

"watson, your service revolver, please."

As that Guile discussion was winding down, I started to hear the howls from an unexpected quarter: JavaScript. You might have heard, perhaps, that JavaScript eval is crazy. Well, it is. But ES5 strict was meant to kill off its most egregious aspect, in which eval can introduce new local variables to a function.

Now I've been slowly hacking on implementing block-scoped let and const in JavaScriptCore, so that we can consider switching gnome-shell over to use JSC. Beyond standard ES5 supported in JSC, existing gnome-shell code uses let, const, destructuring binding, and modules, all of which are bound to be standardized in the upcoming ES6. So, off to the hack.

My initial approach was to produce a correct implementation, and then make it fast. But the JSC maintainers, inspired by the idea that "let is the new var", wanted to ensure that let was fast from the beginning, so that it doesn't get a bad name with developers. OK, fair enough!

Beyond that, though, it looks like TC39 folk are eager to get let and const into all parts of JavaScript, not just strict mode. Do you hear the hound? It rides again! Now we have to figure out how block scope interacts with non-strict eval. Awooooo!

Thankfully, there seems to be a developing consensus that eval("let x = 20") will not introduce a new block-scoped lexical. So, down boy. The hound is at bay, for now.

life with dogs

I'm making my peace with eval. Certainly in JavaScript it's quite a burden for an implementor, but the current ES6 drafts don't look like they're making the problem worse. And in Scheme, I'm very happy to provide the primitives needed so that local-eval can be implemented in terms of our existing machinery, without needing symbol tables at runtime. But if you are making a new language, as you value your life, don't go walking on the local-eval moors at night!

Syndicated 2012-02-01 15:33:49 from wingolog

javascript eval considered crazy

Peoples. I was hacking recently on JavaScriptCore, and I came to a realization: JavaScript's eval is absolutely crazy.

I mean, I thought I knew this before. But... words fail me, so I'll have to show a few examples.

eval and introduced bindings

This probably isn't worth mentioning, as you probably know it, but eval can introduce lexical bindings:

 > var foo = 10;
 > (function (){ eval('var foo=20;'); return foo; })()
 > foo

I find this to be pretty insane already, but I knew about it. You would think though that var x = 10; and eval('var x = 10;'); would be the same, though, but they're not:

 > (function (){ var x = 10; return delete x; })()
 > (function (){ eval('var x = 10;'); return delete x; })()

eval-introduced bindings do not have the DontDelete property set on them, according to the post-hoc language semantics, so unlike proper lexical variables, they may be deleted.

when is eval really eval?

Imagine you are trying to analyze some JavaScript code. If you see eval(...), is it really eval?

Not necessarily.

eval pretends to be a regular, mutable binding, so it can be rebound:

 > eval = print
 > eval('foo')
 foo // printed

or, shadowed lexically:

 > function () { var eval = print; eval('foo'); }
 foo // printed

or, shadowed dynamically:

 > with({'eval': print}) { eval('foo'); }
 foo // printed

You would think that if you can somehow freeze the eval binding on the global object, and verify that there are no with forms, and no statements of the form var eval = ..., that you could guarantee that eval is eval, but that is not the case:

 > Object.freeze(this);
 > (function (x){ return [eval(x), eval(x)]; })('var eval = print; 10')
 var eval = print; 10 // printed, only once!

(Here the first eval created a local binding for eval, and evaluated to 10. The second eval was actually a print.)


an eval by any other name

So eval is an identifier that can be bound to another value. OK. One would expect to be able to bind another identifier to eval, then. Does that work? It seems to work:

 > var foo = eval;
 > foo('foo') === eval;

But not really:

 > (function (){ var quux = 10; return foo('quux'); } )()
 Exception: ReferenceError: Can't find variable: quux

eval by any other name isn't eval. (More specifically, eval by any other name doesn't have access to lexical scope.)

Note, however, the mere presence of a shadowed declaration of eval doesn't mean that eval isn't eval:

 > var foo = 10
 > (function(x){ var eval = x; var foo = 20; return [x('foo'), eval('foo')] })(eval)


strict mode restrictions

ECMAScript 5 introduces "strict mode", which prevents eval from being rebound:

 > (function(){ "use strict"; var eval = print; })
 Exception: SyntaxError: Cannot declare a variable named 'eval' in strict mode.
 > (function(){ "use strict"; eval = print; })
 Exception: SyntaxError: 'eval' cannot be modified in strict mode
 > (function(){ "use strict"; eval('eval = print;'); })()
 Exception: SyntaxError: 'eval' cannot be modified in strict mode
 > (function(x){"use strict"; x.eval = print; return eval('eval');})(this)
 Exception: TypeError: Attempted to assign to readonly property.

But, since strict mode is embedded in "classic mode", it's perfectly fine to mutate eval from outside strict mode, and strict mode has to follow suit:

 > eval = print;
 > (function(){"use strict"; return eval('eval');})()
 eval // printed

The same is true of non-strict lexical bindings for eval:

 > (function(){ var eval = print; (function(){"use strict"; return eval('eval');})();})();
 eval // printed
 > with({'eval':print}) { (function(){ "use strict"; return eval('eval');})() }
 eval // printed

An engine still has to check at run-time that eval is really eval. This crazy behavior will be with us as long as classic mode, which is to say, forever.

Strict-mode eval does have the one saving grace that it cannot introduce lexical bindings, so implementors do get a break there, but it's a poor consolation prize.

in summary

What can an engine do when it sees eval?

Not much. It can't even prove that it is actually eval unless eval is not bound lexically, there is no with, there is no intervening non-strict call to any identifier eval (regardless of whether it is eval or not), and the global object's eval property is bound to the blessed eval function, and is configured as DontDelete and ReadOnly (not the default in web browsers).

But the very fact that an engine sees a call to an identifier eval poisons optimization: because eval can introduce variables, the scope of free variables is no longer lexically apparent, in many cases.

I'll say it again: crazy!!!

Syndicated 2012-01-12 16:34:08 from wingolog

webkittens! lexical scoping is in danger!

The GTK+ WebKittens are on the loose here in Coruña. There's folks here from Red Hat, Motorola, Collabora, and of course Igalia. It's good times; beyond the obvious platitudes of "um, the web is important and stuff" it's good to be in a group that is creating the web experience of millions of users.

My part in that is very small, adding support for block-scoped let and const to JavaScriptCore.

I've made some progress, but it could be going more smoothly. I have made the parser do the right thing for const, correctly raising errors for duplicate bindings, including nested var declarations that get hoisted. The parser is fine: it maintains an environment like you would expect. But the AST assumes that all locals get hoisted to function scope, so there's no provision for e.g. two distinct local variables with the same name. So there is still some work to do on the AST, and it's a thicket of templates.

Hopefully I'll end up with a prototype by the end of the hackfest (Sunday). Sooner if I find that sword of omens, which I seem to have misplaced. Sight beyond sight!

Syndicated 2011-12-02 17:36:56 from wingolog

fscons 2011: free software, free society

Good morning, hackersphere! Time and space are moving, in the egocentric coordinate system at least, but before their trace is gone, I would like to say: FSCONS 2011 was fantastic!

FSCONS is a conference unlike any other I know. I mean, where else can you go from a talk about feminism in free software, to talk about the state of the OpenRISC chip design project, passing through a hallway track conversation on the impact of cryptocurrency on the welfare state, approached from an anarchist perspective?

Like many of you, I make software because I like to hack. But I make Free Software in particular because I value all kinds of freedom, as part of the "more beautiful world our hearts know is possible". We make the material conditions of tomorrow's social relations, and I want a world of sharing and mutual aid.

But when we reflect on what our hands are making, we tend do so in a context of how, not why. That's why I enjoyed FSCONS so much, that it created a space for joining the means of production to their ends: a cons of Free Software, Free Society.

As a GNU hacker, I'm especially honored by the appreciation that FSCONS particpants have for GNU. FSCONS has a tithe, in which a portion of the entry fees is donated to some project, and this year GNU was chosen as the recipient. It's especially humbling, given the other excellent projects that were nominated for the tithe.

So thank you very much, FSCONS organizers and participants. I had a great time!

are you bitter about it?

I gave a talk there at FSCONS, GNU Guile: Free Software Means of Production (slides, notes).

Unlike many of my other talks, this one was aimed at folks that didn't necessarily know very much about Guile. It was also different from other talks in that it emphasized Guile as a general programming environment, not as an extension language. Guile is both things, and as the general-purpose side gets a lot less publicity, I wanted to emphasize it in this talk. Hopefully the videos will be up soon.

In the last 20 minutes or so, we did a live-hack. Inspired by a tweet by mattmight, we built Bitter, a one-bit Twitter. I tried to convey what it's like to hack in Guile, with some success I think. Source code for the live-hack, such as it is, is linked to at the end of the page.

For a slightly more extended example of a web application, check out Peeple, originally presented in a talk at FOSDEM, back in February. Peeple has the advantage of being presented as a development of separate git commits. Slides of that talk, Dynamic Hacking with Guile, are also available, though they are not as developed as the ones from FSCONS.

Finally, for the real documentation, see the Guile manual.

Happy hacking, and hopefully see you at FSCONS next year!

Syndicated 2011-11-28 11:38:36 from wingolog

JavaScriptCore, the Webkit JS implementation

My readers will know that I have recently had the pleasure of looking into the V8 JavaScript implementation, from Google. I'm part of a small group in Igalia doing compiler work, and it's clear that in addition to being lots of fun, JavaScript implementations are an important part of the compiler market today.

But V8 is not the only JS implementation in town. Besides Mozilla's SpiderMonkey, which you probably know, there is another major Free Software JS implementation that you might not have even heard of, at least not by its proper name: JavaScriptCore.

jsc: js for webkit

JavaScriptCore (JSC) is the JavaScript implementation of the WebKit project.

In the beginning, JavaScriptCore was a simple tree-based interpreter, as Mozilla's SpiderMonkey was. But then in June of 2008, a few intrepid hackers at Apple wrote a compiler and bytecode interpreter for JSC, threw away the tree-based interpreter, and called the thing SquirrelFish. This was eventually marketed as "Nitro" inside Apple's products[0].

JSC's bytecode interpreter was great, and is still pretty interesting. I'll go into some more details later in this article, because its structure conditions the rest of the implementation.

But let me continue here with this historical sketch by noting that later in 2008, the WebKit folks added inline caches, a regular expression JIT, and a simple method JIT, and then called the thing SquirrelFish Extreme. Marketers called this Nitro Extreme. (Still, the proper name of the engine is JavaScriptCore; Wikipedia currently gets this one wrong.)

One thing to note here is that the JSC folks were doing great, well-factored work. It was so good that SpiderMonkey hackers at Mozilla adopted JSC's regexp JIT compiler and their native-code assembler directly.

As far as I can tell, for JSC, 2009 and 2010 were spent in "consolidation". By that I mean that JSC had a JIT and a bytecode interpreter, and they wanted to maintain them both, and so there was a lot of refactoring and tweaking to make them interoperate. This phase consolidated the SFX gains on x86 architectures, while also adding implementations for ARM and other architectures.

But with the release of V8's Crankshaft in late 2010, the JS performance bar had been lowered again (assuming it is a limbo), and so JSC folks started working on what they call their "DFG JIT" (DFG for "data-flow graph"), which aims be more like Crankshaft, basically.

It's possible to configure a JSC with all three engines: the interpreter, the simple method JIT, and the DFG JIT. In that case there is tiered compilation between the three forms: initial parsing and compilation produces bytecode, that can be optimized with the method JIT, that can be optimized by the DFG JIT. In practice, though, on most platforms the interpreter is not included, so that all code runs through the method JIT. As far as I can tell, the DFG JIT is shipping in Mac OS X Lion's Safari browser, but it is not currently enabled on any platform other than 64-bit Mac. (I am working on getting that fixed.)

a register vm

The interpreter has a number of interesting pieces, but it is important mostly for defining the format of bytecode. Bytecode is effectively the high-level intermediate representation (IR) of JSC.

To put that into perspective, in V8, the high-level intermediate representation is the JS source code itself. When V8 first sees a piece of code, it pre-parses it to raise early syntax errors. Later when it needs to analyze the source code, either for the full-codegen compiler or for Hydrogen, it re-parses it to an AST, and then works on the AST.

In contrast, in JSC, when code is first seen, it is fully parsed to an AST and then that AST is compiled to bytecode. After producing the bytecode, the source text isn't needed any more, and so it is forgotten. The interpreter interprets the bytecode directly. The simple method JIT compiles the bytecode directly. The DFG JIT has to re-parse the bytecode into an SSA-style IR before optimizing and producing native code, which is a bit more expensive but worth it for hot code.

As you can see, bytecode is the common language spoken by all of JSC's engines, so it's important to understand it.

Before really getting into things, I should make an aside about terminology here. Traditionally, at least in my limited experience, a virtual machine was considered to be a piece of software that interprets sequences of virtual instructions. This would be in contrast to a "real" machine, that interprets sequences of "machine" or "native" instructions in hardware.

But these days things are more complicated. A common statement a few years ago would be, "is JavaScript interpreted or compiled?" This question is nonsensical, because "interpreted" or "compiled" are properties of implementations, not languages. Furthermore the implementation can compile to bytecode, but then interpret that bytecode, as JSC used to do.

And in the end, if you compile all the bytecode that you see, where is the "virtual machine"? V8 hackers still call themselves "virtual machine engineers", even as there is no interpreter in the V8 sources (not counting the ARM simulator; and what of a program run under qemu?).

All in all though, it is still fair to say that JavaScriptCore's high-level intermediate language is a sequence of virtual instructions for an abstract register machine, of which the interpreter and the simple method JIT are implementations.

When I say "register machine", I mean that in contrast to a "stack machine". The difference is that in a register machine, all temporary values have names, and are stored in slots in the stack frame, whereas in a stack machine, temporary results are typically pushed on the stack, and most instructions take their operands by popping values off the stack.

(Incidentally, V8's full-codegen compiler operates on the AST in a stack-machine-like way. Accurately modelling the state of the stack when switching from full-codegen to Crankshaft has been a source of many bugs in V8.)

Let me say that for an interpreter, I am totally convinced that register machines are the way to go. I say this as a Guile co-maintainer, which has a stack VM. Here are some reasons.

First, stack machines penalize named temporaries. For example, consider the following code:

(lambda (x)
  (* (+ x 2)
     (+ x 2)))

We could do common-subexpression elimination to optimize this:

(lambda (x)
  (let ((y (+ x 2)))
    (* y y)))

But in a stack machine is this really a win? Consider the sequence of instructions in the first case:

; stack machine, unoptimized
0: local-ref 0      ; x
1: make-int8 2
2: add
3: local-ref 0      ; x
4: make-int8 2
5: add
6: mul
7: return

Contrast this to the instructions for the second case:

; stack machine, optimized
0: local-ref 0      ; push x
1: make-int8 2      ; push 2
2: add              ; pop x and 2, add, and push sum
3: local-set 1      ; pop and set y
4: local-ref 1      ; push y
5: local-ref 1      ; push y
6: mul              ; pop y and y, multiply, and push product
7: return           ; pop and return

In this case we really didn't gain anything, because the storing values to locals and loading them back to the stack take up separate instructions, and in general the time spent in a procedure is linear in the number of instructions executed in the procedure.

In a register machine, on the other hand, things are easier, and CSE is definitely a win:

0: add 1 0 0           ; add x to x and store in y
1: mul 2 1 1           ; multiply y and y and store in z
2: return 2            ; return z

In a register machine, there is no penalty to naming a value. Using a register machine reduces the push/pop noise around the instructions that do the real work.

Also, because they include the names (or rather, locations) of their operands within the instruction, register machines also take fewer instructions to do the job. This reduces dispatch cost.

In addition, with a register VM, you know the size of a call frame before going into it, so you can avoid overflow checks when pushing values in the function. (Some stack machines also have this property, like the JVM.)

But the big advantage of targeting a register machine is that you can take advantage of traditional compiler optimizations like CSE and register allocation. In this particular example, we have used three virtual registers, but in reality we only need one. The resulting code is also closer to what real machines expect, and so is easier to JIT.

On the down side, instructions for a register machine typically occupy more memory than instructions for a stack machine. This is particularly the case for JSC, in which the opcode and each of the operands takes up an entire machine word. This was done to implement "direct threading", in which the opcodes are not indexes into jump tables, but actually are the addresses of the corresponding labels. This might be an acceptable strategy for an implementation of JS that doesn't serialize bytecode out to disk, but for anything else the relocations are likely to make it a lose. In fact I'm not sure that it's a win for JSC even, and perhaps the bloat was enough of a lose that the interpreter was turned off by default.

Stack frames for the interpreter consist of a six-word frame, the arguments to the procedure, and then the locals. Calling a procedure reserves space for a stack frame and then pushes the arguments on the stack -- or rather, sets them to the last n + 6 registers in the stack frame -- then slides up the frame pointer. For some reason in JSC the stack is called the "register file", and the frame pointer is the "register window". Go figure; I suppose the names are as inscrutable as the "activation records" of the stack world.

jit: a jit, a method jit

I mention all these details about the interpreter and the stack (I mean, the register file), because they apply directly to the method JIT. The simple method JIT (which has no name) does the exact same things that the bytecode interpreter does, but it does them via emitted machine instructions instead of interpreting virtual instructions.

There's not much to say here; jitting the code has the result you would expect, reducing dispatching overhead, while at the same time allowing some context-specific compilation, like when you add a constant integer to a variable. This JIT is really quick-and-dirty though, so you don't get a lot of the visibility benefits traditionally associated with method JITs like what HotSpot's C1 or C2 currently have. Granted, the register VM bytecode does allow for some important optimizations to happen, but JSC currently doesn't do very much in the way of optimizing bytecode, as far as I can tell.

Thinking more on the subject, I suspect that for Javascript, CSE isn't even possible unless you know the types, as a valueOf() callback could have side effects.

an interlude of snarky footnotes

Hello, reader! This is a long article, and it's a bit dense. I had some snarky footnotes that I enjoyed writing, but it felt wrong to put them at the end, so I thought it better to liven things up in the middle here. The article continues in the next section.

0. In case you didn't know, compilers are approximately 37% composed of marketing, and rebranding is one of the few things you can do to a compiler, marketing-wise, hence the name train: SquirrelFish, Nitro, SFX, Nitro Extreme...[1] As in, in response to "I heard that Nitro is slow.", one hears, "Oh, man they totally fixed that in SquirrelFish Extreme. It's blazingly fast![2]"

1. I don't mean to pick on JSC folks here. V8 definitely has this too, with their "big reveals". Firefox people continue to do this for some reason (SpiderMonkey, TraceMonkey, JaegerMonkey, IonMonkey). I expect that even they have forgotten the reason at this point. In fact the JSC marketing has lately been more notable in its absence, resulting in a dearth of useful communication. At this point, in response to "Oh, man they totally are doing a great job in JavaScriptCore", you're most likely to hear, "JavaScriptCore? Never heard of it. Kids these days hack the darndest things."

2. This is the other implement in the marketer's toolbox: "blazingly fast". It means, "I know that you don't understand anything I'm saying, but I would like for you to repeat this phrase to your colleagues please." As in, "LLVM does advanced TBAA on the SSA IR, allowing CSE and LICM while propagating copies to enable SIMD loop vectorization. It is blazingly fast."

dfg: a new crankshaft for jsc?

JavaScriptCore's data flow graph (DFG) JIT is work by Gavin Barraclough and Filip Pizlo to enable speculative optimizations for JSC. For example, if you see the following code in JS:

a[i++] = 0.7*x;

then a is probably an array of floating-point numbers, and i is probably an integer. To get great performance, you want to use native array and integer operations, so you speculatively compile a version of your code that makes these assumptions. If the assumptions don't work out, then you bail out and try again with the normal method JIT.

The fact that the interpreter and simple method JIT have a clear semantic model in the form of bytecode execution makes it easy to bail out: you just reconstruct the state of the virtual registers and register window, then jump back into the code. (V8 calls this process "deoptimization"; the DFG calls it "speculation failure".)

You can go the other way as well, switching from the simple JIT to the optimized DFG JIT, using on-stack replacement. The DFG JIT does do OSR. I hear that it's needed if you want to win Kraken, which puts you in lots of tight loops that you need to be able to optimize without relying on being able to switch to optimized code only on function re-entry.

When the DFG JIT is enabled, the interpreter (if present) and the simple method JIT are augmented with profiling information, to record what types flow through the various parts of the code. If a loop is executed a lot (currently more than 1000 times), or a function is called a lot (currently about 70 times), the DFG JIT kicks in. It parses the bytecode of a function into an SSA-like representation, doing inlining and collecting type feedback along the way. This probably sounds very familiar to my readers.

The difference between JSC and Crankshaft here is that Crankshaft parses out type feedback from the inline caches directly, instead of relying on in-code instrumentation. I think Crankshaft's approach is a bit more elegant, but it is prone to lossage when GC blows the caches away, and in any case either way gets the job done.

I mentioned inlining before, but I want to make sure that you noticed it: the DFG JIT does do inlining, and does so at parse-time, like HotSpot does. The type profiling (they call it "value profiling") combined with some cheap static analysis also allows the DFG to unbox int32 and double-precision values.

One thing that the DFG JIT doesn't do, currently, is much in the way of code motion. It does do some dead-code elimination and common-subexpression elimination, and as I mentioned before, you need the DFG's value profiles in order to be able to do this correctly. But it does not, as far as I can tell, do much in the way of code motion, like loop-invariant code motion.

Also, the DFG's register allocator is not as good as Crankshaft's, yet. It is hampered in this regard by the JSC assembler that I praised earlier; while indeed a well-factored, robust piece of code, JSC's assembler has a two-address interface instead of a three-address interface. This means that instead of having methods like add(dest, op1, op2), it has methods like add(op1, op2), where the operation implicitly stores its result in its first operand. Though it does correspond to the x86 instruction set, this sort of interface is not great for systems where you have more registers (like on x86-64), and forces the compiler to shuffle registers around a lot.

The counter-based optimization triggers do require some code to run that isn't strictly necessary for the computation of the results, but this stratey does have the nice property that the DFG performance is fairly predictable, and measurable. Crankshaft, on the other hand, being triggered by a statistical profiler, has statistically variable performance.

And speaking of performance, AWFY on the mac is really where it's at for JSC right now. Since the DFG is only enabled by default on recent Mac OS 64-bit builds, you need to be sure you're benchmarking the right thing.

Looking at the results, I think we can say that JSC's performance on the V8 benchmark is really good. Also it's interesting to see JSC beat V8 on SunSpider. Of course, there are lots of quibbles to be had as to the validity of the various benchmarks, and it's also clear that V8 is the fastest right now once it has time to warm up. But I think we can say that JSC is doing good work right now, and getting better over time.


So that's JavaScriptCore. The team -- three people, really -- is mostly focusing on getting the DFG JIT working well right now, and I suspect they'll be on that for a few months. But we should at least get to the point where the DFG JIT is working and enabled by default on free systems within a week or two.

The one other thing that's in the works for JSC is a new generational garbage collector. This is progressing, but slowly. There are stubs in the code for card-marking write barriers, but currently there is no such GC implementation, as far as I can tell. I suspect that someone has a patch they're cooking in private; we'll see. At least JSC does have a Handle API, unlike SpiderMonkey.


So, yes, in summary, JavaScriptCore is a fine JS implementation. Besides being a correct implementation for real-world JS -- something that is already saying quite a lot -- it also has good startup speed, is fairly robust, and is working on getting an optimizing compiler. There's work to do on it, as with all JS implementations, but it's doing fine.

Thanks for reading, if you got this far, though I understand if you skipped some parts in the middle. Comments and corrections are most welcome, from those of you that actually read all the way through, of course :). Happy hacking!

Syndicated 2011-10-28 15:51:24 from wingolog

the user in the loop

The videos from this year's GNU Hackers Meeting in Paris are up. All the videos are linked from that page. There were technical problems with a couple of them, and we didn't get BT Templeton's presentation on Emacs Lisp in Guile (using delimited dynamic bindings to implement buffer-local variables!), but all in all it's a good crop.

I gave a talk entitled "The User in the Loop" that argued for the place of extensibility in the GNU project. I also pointed out some ways in which Guile fulfills those needs, but that was not the central point to the talk. I argued that point at more length in a previous article.

Anyway, let's give that newfangled HTML5 video tag a go here. It starts with the cut-off statement, "We all know that code motion is very important to efficiency, so..."

Alternately you can download the video directly (~350MB, 52 minutes). There are notes too, a superset of the slides from the talk.

Syndicated 2011-10-19 14:32:34 from wingolog

what's your C migration plan?

I spend a lot of time in my neighborhood café. It has character. Some of it is generic: like in many other bars, the round tables are marble, the chairs are rickety, the customers and the barmen all shout at each other.

But it has its own thing. The owner inherited it from his father, but he was never sure whether he wanted a record store or a bar, so it's half and half. In practice the CDs in the changer don't change very often, and the rest are all in a side room that is otherwise packed full of junk.

In the morning, everyone gets tallats. Old ladies get theirs with pastries. Old men get theirs amb gotes, with liquor. Morning workers rush in, slam one back, slap coins on the counter and get out. Adéu!

Those with more time, like myself, get a cafè amb llet. Most read the paper. I hack. I spend a couple mornings a week there. It's right pleasant to work there, without internet. My most productive hours of the week are there in the bar.

I do chitchat a bit, though: with designers, mothers, the barmen, retirees, random folk. There's one fellow in particular I like talking with, a designer. Turns out he wants to learn how to program. He told me yesterday that he wanted to learn C.

on c

Now, I spend a lot of time in C. I've written loads of it. I continue to write it at times. I'm fond of it, and it has served me well.

But I have come to believe that writing new programs in C is the hacker equivalent of malpractice. C is just too dangerous to use. The risks are not worth the benefits.

Let's be clear about the benefits of writing in C, before looking at its flaws. I want to be really fair here. C is fast. It has great compilers, everywhere. And they are everywhere: C is ubiquitous. It is very flexible, also. You can do anything in C. It is great for programming drivers. It is possible to write big e-mail clients in it. It has great low-level bit operations (though C++ is getting better at it, with its value types). It is a power tool, and it puts you in control. There are loads of libraries written in it, and for it. It encourages minimalism.

In short, C has been a wild success, and with good reason.

But it's time to stop using it.

The essence of the problem with C is that the behavior of erroneous programs is not specified. This doesn't sound like a big deal, but it is. Let's make an example. What is the meaning of this Python program?

def array_set(a, n, x):
  a[n] = x

I think we would all say that it sets the nth element of a to x. But what about this one in C:

void array_set (void **a, size_t n, void *x) {
  a[n] = x;

This one, you really can't tell. You can't use Python's definition, because there are no guarantees about the valid indices to set in the array. In fact there is no array, just a pointer.

In the Python case, if you pass an out-of-bounds array index, you will get an exception. In the C case, you get undefined behavior. Lately what "undefined behavior" means is that the state or organized crime gets to take control of your computer: read your email, log your keystrokes, take pictures of you.

I'm deadly serious. Let's take a better example. Last year Liu Xiaobo won the Nobel Peace Prize for "his long and non-violent struggle for fundamental human rights in China." A couple weeks later, it was found that someone cracked the Nobel Prize website and was serving a Firefox zero-day that took over users' machines.

Apparently, someone was interested in what was on the computers of visitors to the Nobel prize site -- interested enough to use a fresh zero-day, something that probably sells on the black market for some $20,000 or more.

Lately I've been quoting Eben Moglen a lot, especially from the talk he gave at this year's FOSDEM conference. One thing he said there is that "software is the steel of the 21st century". As steel shaped the social relations of the last century, what we hack now forms the material conditions of tomorrow: the next Arab Spring, or the next Green Scare.

Almost all software is connected to the net these days, and so it is all under attack. If you write software in C these days -- software with bugs, like any software -- you are writing software with undefined behavior, and thus, software that enables powerful state and organized crime actors to take advantage of your users.

That sounds a bit exaggerated, no? It is true, though. Look at what is happening with browser vulnerabilities, or what is the same, PDF or Flash vulnerabilities, or PNG or MP4, or what-have-you. The commonality here is that powerful interests exploit unsuspecting users due to flaws in the C and C++ language. Writing more C these days is malpractice.


I still write C. I work on implementations of safe languages -- languages that don't have the same kinds of fundamental vulnerabilities that C and C++ have. Eventually the amount of C in the world will stop growing, and decline as pieces that are now written in C will be written in Python, in JavaScript, in Guile: in short, in languages that don't launch the missiles when you try to write beyond the end of an array.

C has had a great run; we should celebrate it. But its time has passed. What is your migration strategy? How are you going to stop writing C?

Syndicated 2011-10-13 20:44:55 from wingolog

partial evaluation in guile

Friends, something awesome just happened: Guile just got itself a respectable inliner.

I have said before on this blog, quoting commenter Rémi Forax, that "inlining is the mother of all optimizations". It is true that inlining opens up space for code motion, constant folding, dead code elimination, and loop optimizations. However, credit might be better laid at the feet of partial evaluation, the mother of all inlining algorithms.

Partial evaluation is a source-to-source transformation that takes your program and produces a better one: one in which any computation that can be done at compile-time is already made, leaving only those computations that need to done at run-time.

For example, the application

(+ 2 3)

can clearly be evaluated at compile-time. We say that the source expression (+ 2 3) reduces to 5 via constant folding. The result, 5 in this case, is the residual expression.

A more complicated example would look like:

(let ((string->chars
       (lambda (s)
         (define char-at
           (lambda (n) (string-ref s n)))
         (define len
           (lambda () (string-length s)))
         (let loop ((i 0))
           (if (< i (len))
               (cons (char-at i)
                     (loop (1+ i)))
  (string->chars "yo"))
=> (list #\y #\o)

Here when I write =>, you should read it as, "residualizes at compile-time to". In this case our input program residualized, at compile-time, to a simple list construction. The loop was totally unrolled, the string-refs folded, and all leaf procedures were inlined.

Neat, eh?

optimization enables expressiveness

If the partial evaluator does its job right, the residual program will run faster. However this isn't the real reason that I'm so pleased with it; rather, it's that it lets me write different programs.

You see, I hack on Guile's compiler and VM and all that. When I write code, I know what Guile is going to do with it. Unfortunately, this caused my programs to be uglier than necessary, because I knew that Guile wasn't going to inline some important things for me. I wrote at a lower level of abstraction, because I couldn't trust the compiler.

Now, with the partial evaluator, I'm happy to use helper functions, even higher-order helpers, with the knowledge that Guile will mostly do the right thing. This is particularly important in the context of languages that support syntactic abstraction, like Scheme. If you're a Schemer and haven't seen Kent Dybvig's Macro Writers' Bill of Rights talk (slides), do check it out.

Incidentally, there was a sad moment in JSConf.eu a couple weekends ago when Andreas Gal (of all people!) indicated that he had to manually inline some functions in PDF.js in order to get adequate speed. More on JavaScript a little later, though.

about partial evaluation

A partial evaluator looks a lot like a regular meta-circular evaluator. It's a recursive function that takes an expression and an environment and yields a value. Guile's partial evaluator, peval, builds up lexical environments when it sees let and other binding constructs, and tries to propagate copies when it sees lexical references.

Inlining is facilitated by copy-propagation of lambda expressions. Just as the initial value 0 in the example above propagates through the lexical variable i to reach (< i (len)), (lambda () (string-length s)) propagates to len. Application of a lambda expression reduces to the equivalent of a let binding. So for the first iteration of loop above, we have:

(< i (len))
;; copy propagation
=> (< 0 ((lambda () (string-length s))))
;; beta-reduction
=> (< 0 (string-length s))
;; copy-propagation
=> (< 0 (string-length "yo"))
;; constant-folding
=> (< 0 2)
;; constant-folding
=> #t

In this case the condition folded to a constant, so we know at compile-time which branch to take. The second branch is dead, so we eliminate it. The process continues until we finally produce the resulting list.

down the rabbit hole

Up to here things are easy: we have a simple, well-typed example that terminates. But to be part of a real-world compiler, a partial evaluator needs to handle real-world code: accessors for mutable data, access to mutable bindings (lexical and global), indefinite recursion, unbound variables, and poorly-typed programs. In addition, a real-world inliner needs to run quickly and avoid producing bloated residual code.

I should take a moment and note that statically-typed, functional languages can avoid a number of these problems, simply by defining them away. It is no wonder that compiler people tend towards early binding. Scheme does exhibit a fair amount of early binding through its use of lexical scope, but it is not a pure functional language. Working on this peval was the first time that I wished for immutable pairs in Scheme, as promoted by Racket and R6RS.

Anyway, having mutability in your language isn't so bad. You do miss some optimization opportunities, but that is OK. What is not OK in a production peval is spending too much time on an expression.

Guile's solution, following Waddell and Dybvig's excellent Fast and Effective Procedure Inlining, is to simply count the number of times through the inliner. Each inlining attempt gets a fresh counter, and any work performed within an inlining attempt decrements the counter. When the counter reaches zero, the inlining attempt is aborted, and a call is residualized instead. Since the number of call sites in the program is fixed, and there is a maximum amount of work that will be done at each call site, the resulting algorithm is O(N) in the size of the source program.

Guile's partial evaluator also uses the on-demand, online strategy of Waddell and Dybvig, to allow definitions to be processed in their use contexts. For example, (cons 1 2) may be reduced to #t when processed as a test, in a conditional. If, after processing the body of a let, a binding is unreferenced, then it is processed for effect. Et cetera.

With the effort counter in place, Guile simply tries to inline every call site in the program, knowing that it will bail out if things don't work. It sounds a little crazy, but it works, as Waddell and Dybvig show. The effort counter also serves to limit code growth, though it is a bit crude. In any case I got less than a percent of code growth when optimizing the psyntax expander that Guile uses, which is a win in my book.


Partial evaluation can only propagate bindings whose definitions are known. In the case of Guile, then, that restricts inlining to lexical references and primitive references, and notably excludes global references and module imports, or fields of mutable objects. So this does not yet give us cross-module inlining, beyond the hacks that abuse the macro expander.

This observation has a correlary, in that some languages promote a style of programming that is difficult to analyze. I'm really talking about object-oriented languages here, and the dynamic ones in particular. When you see o.foo() in Java, there is at least the possibility that foo is a final method, so you know you can inline it if you choose to. But in JavaScript if you see o.foo(), you don't know anything: the set of properties of o can and does vary at runtime as people monkey-patch the object o, its prototype, or Object.prototype. You can even change o.__proto__ in most JS implementations. Even if you can see that your o.foo() call is dominated by a o.foo = ... assignment, you still don't know anything in ES5, as o could have a setter for the foo property.

This situation is mitigated in the JavaScript world by a couple of things.

First of all, you doesn't have to program this way: you can use lexical scoping in a more functional style. Coupled with strict mode, this allows a compiler to see that a call to foo can be inlined, as long as foo isn't mutated in the source program. That is a property that is cheap to prove statically.

However, as Andreas Gal found out, this isn't something that the mainstream JS implementations do. It is really a shame, and it has lasting impacts on programmers.

I even heard a couple people say that in JS, you should avoid deep lexical bindings, because the access time depends on the binding depth. While this is true for current implementations, it is a property of the implementations and not of the language. Absent with and eval-introduced bindings, a property that is true in strict-mode code, it is possible to quickly compute the set of free variables for every function expression. When the closure is made, instead of grabbing a handle on some sort of nested scope object, a JS implementation can just copy the values of the free variables, and store them in a vector associated with the function code. (You see, a closure is code with data.) Then any accesses to those variables go through the vector instead of the scope.

For assigned variables -- again, a property that can be proven statically -- you put the variables in a fresh "box", and rewrite accesses to those variables to go through that box. Capturing a free variable copies the box instead of its value.

There is nothing new about this technique; Cardelli and Dybvig (and probably others) discovered it independently in the 80s.

This point about closure implementation is related to partial evaluation: people don't complain much about the poor static inliners of JS, because the generally poor closure implementations penalize lexical abstraction. Truly a shame!

* * *

It seems I have digressed. Sorry about that!

I spoke about closures and lexical scope, properties of the JS language that can enable static inlining. The second (and more important) way that JS implementations can support inlining is dynamically. I trolled about that some months ago. Dynamic inlining is fantastic, when it works, though there are limiting heuristics (scroll down to "inlining", and note that the exact set of heuristics have changed in the intervening months).

So my last point was about something that Guile does well that JS implementations do poorly, and it's fair that this point should be the reverse. I would like to be able to dynamically inline, but this would mean associating the intermediate representation with Scheme functions. As Guile can compile code ahead-of-time, this means we would have to serialize the IR out to disk, in much the same way as GCC's new link-time optimizer (LTO) does. But I would like to put that off until we change the format of compiled Guile code to be ELF. Otherwise we run the risk of bloating our runtime memory size.

try it out

Guile's partial evaluator was joint work between myself and my fellow Guile maintainer Ludovic Courtès, and was inspired by a presentation by William Cook at DSL 2011, along with the Waddell and Dybvig's Fast and Effective Procedure Inlining.

This code is currently only in the development Guile tree, built from git. Barring problems, it will be part of Guile 2.0.3, which should be out in a couple weeks.

You can check out what the optimizer does at the command prompt:

>,optimize (let ((x 13)) (* x x))
$1 = 169
>,optimize (let ((x 13)) (* x foo))
$2 = (* 13 foo)

Have fun, and send bugs to bug-guile@gnu.org.

Syndicated 2011-10-11 10:01:30 from wingolog

a schemer at jsconf.eu

Yow! I am just back from JSConf.eu, a lovely & lively gathering of JavaScript hackers in sunny Berlin. And it was sunny, this foreign land, and foreign indeed: as my long-time readers will know, I am not a JavaScript hacker, but a Schemer with a C problem. JSConf.eu was a pleasant place to visit.

It turns out that I wasn't the only one there in the tourist condition; all of the other implementors seemed to be in a similar situation. The Mozilla folks look at themselves as "platform engineers". V8 hacker Vyacheslav Egorov has a thing for Oberon; I'm sure he'll tell you about it if you ask. And so on. Indeed I think that of the folks that were there, Brendan is the only one implementing JavaScript in JavaScript. Do You Speak It!

great job, organizers!

But besides giving me the opportunity to gossip about garbage collection and type feedback, JSConf.eu was a well-organized, energetic event, and I say that as one who has been to dozens of conferences in the last few years.

There was great sound and video, of course. It looked like they did a pretty good job with the recording, so the videos should be online at some point. In the downstairs room, they had Anna Lena Schiller drawing live "graphic recordings" as the speakers spoke, on large sheets of glossy paper, in color, illustrating the main ideas of the talks. As the weekend progressed, all of these pieces were mounted on the wall for people to see. In the end they will be scanned and uploaded, and the originals mailed to the speakers. It was amazing, and an impressive amount of work.

It's silly, but let me also mention the food: there were fresh-made frozen-yogurt smoothies with brownie toppings. There were fresh-made croissants and bread, and a plate of some dozen or so French cheeses, and made-to-order breakfast, and plates of tarts, and the bar was open all day long, with apfelschorle and club-mate and beer and coffee, and there was an espresso stand, and everything was free. There was lunch and dinner at the venue and free parties before, during, and after. The only caveat was that the vegetarian options were relatively few. Still, though way to raise the bar, JSConf.eu!


I could be overly sensitive about this, but I would like to put one thing out there. There were a number of sexuality-related aspects of this event that I did not feel comfortable with. There were a couple speakers that made live-chat applications with node.js, and demoed them to the room, and a few anonymous penis jokes ended up on them. There was a speaker that made a sexual joke about Allow: GET, HEAD in HTTP headers. Most of all though, they had the (very talented) caberet singer Mandy Lauderdale there, who started with a "Brendan Eich Fanclub" thing, then put on a show the next night at the party.

Please understand me when I say that I have no problem with sexuality in general -- it is fun, subversive, and in any case a part of life. However in the programming world we have a gender-balance problem, and I don't think that incorporating sexuality into our culture helps that. In the particular case of Mandy's (talented and sometimes funny) performance, I get the feeling that we're promoting a "boy's club" atmosphere where it becomes OK to make penis jokes.

Like I said, I'm just putting that out there. Feel free to disagree, politely of course.


The first talk I saw was by Dean McNamee of plask, a non-browser JS environment for making art. Plask helps make graphical art by providing 2D and 3D drawing primitives to JS. It also integrates with other devices via MIDI and OSC, and can be sequenced over MIDI (or OSC, I presume).

Dean's message was that we need to stop caring so much about the "how" and care more about the "what" and "when". There was no code in the presentation, only demos of what Plask has done; and indeed Dean was quite honest that it was the crappiest code he has ever written. But the demos were pretty awesome.

There are obviously some problems with that approach to software, that it is difficult to share or to form a piece of someone else's work. However it is equally true that many of us spend a lot of time working on things that people will never see, and that sometimes we should just take shortcuts and move on.

Another interesting talk was by Peter van der Zee on JavaScript tools. Peter wrote a JS parser in JS, and an in-browser editor to overlay that tool information and utility onto a piece of JS code. It can provide many warnings, do type inference, and similar things. It can do some automated refactorings, like var hoisting, extract-method, and minification. I wonder if some more principled analysis algorithms like kCFA would provide any features above what he is able to do with his fixed-point constraint solver. Still, neat stuff; check it out on github.

David Mandelin and David Anderson of Mozilla gave a nice introduction to what JIT compilers can do these days in JavaScript, at least at a very local level. The presentation reminded me of Mandelin's old pjit project, to figure out what sorts of optimizations would be necessary to yield the smallest assembly code. My only criticism is that some of their assumptions were specific to the nan-boxing approach. They did include a humble nod at the end that there was still a ways for IonMonkey to go in order to reach what Crankshaft does, which was appropriate IMO.

Brendan Eich gave a talk at the end of the day about the coming ES6. Frankly I have no idea how people could understand the things he spoke of without following es-discuss, and most people I asked afterwards were really skeptical about such things as the "monocle-mustache" operator (that's .{ for those of you following along at home). That said, I do like the triangle operator (|>) and the potential block-lambda syntax. Slides are not up yet, unfortunately.

. . .

The next day, I was pleased to find myself sitting with GNOME hackers Henri Bergius and Garrett LeSage. We all agreed that it was great to be in such a dynamic, energetic conference, and that JSConf.eu was much more lively than recent free software conferences we had been to. However it must be said that JS programmers are at an early stage of free software development in many ways, in that there are many more single-person projects than we have in GNOME (for example). It could be said that our use of C in GNOME forces you to collaborate because you can't build very much (hah!), but large, stable projects with many contributors are relatively rare.

Also, I thought there is a general low esteem among the JS folk for copyleft and privacy, and for software freedom in general. There are exceptions of course, but JS is closely tied to environments with lots of venture capital, so it is to be expected that we get divergence of interests between users and developers. I think this is an area in which GNU folk have something important to say: that it's not just about the tech, it's about the freedom too.

On that note, my Igalia colleague Eduardo Lima's talk on JavaScript in GNOME was well-received. Eduardo talked about the pieces that allow a good JavaScript hacking experience to be had in GNOME, and even did some live-coding. Hopefully he'll blog about his talk soon :-)

I really enjoyed Tom Robinson's talk about compilers and JavaScript. It was about compilers that produce or consume JavaScript. I suppose that in some ways compiling DSLs to JavaScript in an in-browser compiler that passes the result to eval is somewhat like macros, though it is hard to compose with other parts of the target language like lexical scope. Interesting stuff though, and I hope he puts up his slides soon.

Finally, and we're getting to the end here -- I did not intend to write so much! -- V8 hacker Erik Corry had a great talk about garbage collection. V8 always had a generational, moving collector, which is great for most purposes, but struggles with large heaps, because collecting the old generation required a stop-the-world mark and sweep. Their new GC can incrementally collect the old generation, leading to reductions in pause times and higher sustained allocation rates for some problems (like the splay benchmark). There's lots of tricky write-barrier stuff going on there, and they're just now working out the details, but it will arrive to a Chrome near you shortly.

As part of Erik's work, he had enhanced the V8 engine to output heap profiling information on a TCP port. He could then run a node.js process to consume this information and output a graphical representation of the heap in a browser. It shows the new space filling up, scavenging, and promoting data into various parts of the old space. It also shows the old space being incrementally marked and swept, and has a visualization of time spent in the collector versus the mutator. This last visualization nicely showed that the new GC allows the splay benchmark to make forward progress even with a large rate of tenuring into the old generation.

I'm sure the Google folks will make their press release about this as soon as things settle down, but for now you can check it out in the bleeding_edge of V8. Have fun!


A gist of all the slides is up on github, and I suppose it will be updated as new slides are uploaded and videos come online.

There were a number of other talks I didn't get to mention, like Andreas Gal on PDF.js, or Alon Zakai on Emscripten. SQLite compiled from C to JS for the browser? What a world. Thanks again to the JSConf.eu folks for such a great conference, and to Igalia for sending me. Happy hacking!

Syndicated 2011-10-04 18:23:09 from wingolog

from ssa to native code: v8's lithium language

So, friends, about a month ago I posted a look at the Hydrogen intermediate language, as part of an ongoing series on the internals of the V8 JavaScript implementation. Hydrogen is the high-level part of V8's Crankshaft optimizing compiler, where the optimizations happen. This article tells the story of Lithium, Crankshaft's low-level, machine-dependent intermediate language.

totally metal

So here's the deal: you parsed your JavaScript functions to AST objects (parse all the functions!). You emitted some simple native code, quickly but not optimized, and you sat back and waited. After a little while, your profiling told you that some function was hot, so you re-parsed it, generated a graph of Hydrogen instructions and optimized that graph. Now you need to generate code. And that's where we are! Generate all the code!

But that's not what V8 does here. Instead of going directly to native code, it translates the HGraph to a chunk of Lithium code, then emits machine code from those Lithium instructions. Unlike Hydrogen, which is an SSA form in which instructions implicitly define values, the Lithium form is closer to three-address code, with labels and gotos, and explicitly names its operands (if any) and results (if any).

These temporary values are initially declared with a number of constraints, such as "must be in a double-precision register". Later, a register allocator considers the constraints and liveness ranges, allocating each value to a specific register. Code generation happens after register allocation.

an example

Let us take a look at an example to see how these things interact. Given that Lithium is target-specific, we'll just choose one of the targets; ia32 and x64 are the most advanced, so we'll just take x64. (That's what they call it.)

LInstruction* LChunkBuilder::DoAdd(HAdd* instr) {
  if (instr->representation().IsInteger32()) {
    LOperand* left =
    LOperand* right =
    LAddI* add = new LAddI(left, right);
    LInstruction* result = DefineSameAsFirst(add);
    if (instr->CheckFlag(HValue::kCanOverflow)) {
      result = AssignEnvironment(result);
    return result;
  } else if (instr->representation().IsDouble()) {
    return DoArithmeticD(Token::ADD, instr);
  } else {
    return DoArithmeticT(Token::ADD, instr);
  return NULL;

As an aside, this is fairly typical for Crankshaft code: tight, lots of asserts, custom type predicates instead of RTTI, self-explanatory enumerated values, and much more oriented towards function calls than in-place mutations. The new objects are created in a "zone", under the covers, so that once the compiler is done, it can free the whole zone at once. I reflowed the left and right lines; otherwise it really is a paragraph of text.

Unfortunately there are not a lot of big-picture comments, hence this blog series.

Anyway, here we have three cases. The first two are for when the HValue has been has been given an unboxed representation, as an int32 or as a double. The last is for tagged values, which can be small integers (Smi values) or heap-allocated doubles. In these latter cases, we dispatch to the DoArithmeticD and DoArithmeticT helpers, which create generic LArithmeticD and LArithmeticT instructions. But we see that in the first case, there is a special LAddI instruction.

When there is a special case in Crankshaft, it either means that we are working around some strange corner of JavaScript, or that something is being optimized. In this case, it is an optimization, reflecting the importance of integer addition. The concrete optimizations are:

  1. Immediate operands. If one of the addends is a constant at compile-time, it can be encoded into the instruction stream directly.

  2. Overflow check removal. If the range analysis (see my previous article) showed that the value would not overflow, then no overflow check is emitted. (Otherwise, an overflow check will be inserted, branching to a deoptimization bailout. The bailout will need to know what variables are live, in that case, hence the AssignEnvironment call; more on that later.)

The UseRegisterAtStart helper call indicates that the operand should be in a register, and that it will be live at the start of the instruction but not the end. This latter constraint is needed because x86 instructions typically clobber the first operand. UseRegisterOrConstant only allocates a register if the operand is not constant and so cannot be immediate. Finally DefineSameAsFirst constrains the output register to be the same as the first operand. All of these calls additionally inform the register allocator of the uses and definitions, in SSA parlance, so that the allocator can build accurate liveness ranges.

Just to be complete, the double arithmetic case (not shown) is bit more straightforward, in that there are no overflow checks or immediates. Tagged arithmetic, on the other hand, always dispatches to an inline cache. (I find this to be fascinating. Typically language implementations inline fixnum operations, but V8 does not because their inline caches also perform type feedback, allowing unboxing optimizations where possible. In V8, tagged integer arithmetic is never inlined, by default.)

allocate all the registers?

Why bother with this Lithium thing? Why not just generate code directly? The answer is that having a target-specific Lithium language allows target-specific code generation and temporary allocation, while permitting the register allocator itself to be generic. As an example, see the ARM DoAdd implementation.

Most everyone knows, but the idea of register allocation is to store your temporaries in machine registers instead of in memory, so that things run fast. It's tricky though, because nested procedure calls can trash your registers, so you need to "spill" them to memory in certain cases. You also need to avoid thrashing values between registers and memory. V8 also tries to use registers when calling internal runtime helpers, which can result in a fair amount of register shuffling.

That's all traditional compiler stuff. But in a garbage-collected language, we also have the complication of knowing which registers hold garbage-collected values and which don't. For every point in the code in which GC could run, the collector will need to mark those pointers, and possibly relocate them. This complicates the allocator.

Another complication is that besides the normal entry point, optimized functions often have an additional entry point, due to on-stack replacement. That entry point has to take values from memory and pack them into registers in a way that lets the computation proceed.

Finally, the allocator plays a role in permitting so-called "lazy deoptimization". V8 hacker Søren Gjesse writes, in a rare but pleasantly exegetic message to the list:

Lazy deoptimization is what happens when all code in a context is forced to be deoptimized either due to starting debugging or due to a global assumption on the optimized code that no longer holds. The reason for the term "lazy" is that actual deoptimization will not happen until control returns to the optimized function. The way we do this lazy deoptimization is somewhat fragile and we have had quite a few bugs in that code. It is based on destructively patching the optimized code placing a call just after each call forcing deoptimization. And actually the patching is not right after the return point but after the gap code inserted at the return point. For the deferred stack check the code up to restoring the registers (popad in IA-32) is considered the gap code.


The gap code that Søren referred to is the set of moves that shuffles registers back into place after a call. Translation of Hydrogen to Lithium inserts LGap instructions between each LInstruction. When the allocator decides that it needs to spill, restore, or shuffle some registers, it records these moves on an LGap.

A later pass serializes these parallel moves into a specific order, trying to use the minimum number of temporaries. I link to Xavier Leroy's paper above, not because V8 uses Coq, but because of the nice introduction to the problem.

Anyway, the actual algorithm used by the Lithium register allocator appears to be a version of linear-scan, with live range splitting. This part of V8 actually is well-commented, so if you can understand the Sarkar paper, then head on over to ye olde google code and check it out.

generate all the code!

The result of register allocation is that all LOperands have been assigned specific machine registers, and some moves have been recorded in the LGap instructions. Now, finally, we generate code: each Lithium instruction is traversed in order, and they all write assembly into a buffer. See the LAddI code generator, as an example.

All of the JavaScript engines generate code using assemblers written in C++. Some people still assume that compilers must generate assembly, in text, but it's much easier and faster just to call methods on an object that writes bytes into a buffer. This approach also makes it easy to add macro instructions, just by subclassing the assembler. Finally, since modern JS implementations have to muck around with the bits all the time (disassembly, source debugging, backtraces, garbage collection (tracing and moving), deoptimization, OSR, inline caches, type feedback), it's easiest to maintain your invariants if you control the bit generation.

V8's assembler originally came from the Strongtalk project, where some of the V8 hackers came from, long ago. This assembler writes code into memory, going forward, and then when it's finished, it appends "relocation information" to the code, written backwards. Relocation information lets V8 mark interesting places in the generated code: pointers that might need to be relocated (after garbage collection, for example), correspondences between the machine program counter and source locations, etc.

As anyone who's hacked on debuggers knows, the problem with debug information is that it is large. Without clever encodings, the amount of debugging data generated by a compiler can kill runtime performance, simply because of the amount of data that needs to be paged into memory. Of course this concern also applies at compile-time, and especially so for JIT compilers. The Strongtalk assembler's main contribution to V8 (I think) is a complicated bit-encoding that tries to compress this data as far as possible. The assembler also provides iterators for traversing this data at runtime.

finish line

And now, dear readers, I believe that we have come full circle on this look at V8. We started low, looking at the assembly generated by V8 for a simple loop. We elaborated this low-level view with a look at on-stack replacement, though without understanding how such information was available. So we jumped back to the top, taking a high-level look at V8's compilers. We drilled down to Hydrogen, and here we covered Lithium, register allocation, and code generation.

I hope it's been an entertaining series to read; it has certainly been enlightening to write. Thanks very much to my employer Igalia for supporting my work on this. Comments and corrections are welcome, as always, and happy hacking.

Syndicated 2011-09-05 11:25:05 from wingolog

374 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!