Older blog entries for wingo (starting at number 382)

javascript eval considered crazy

Peoples. I was hacking recently on JavaScriptCore, and I came to a realization: JavaScript's eval is absolutely crazy.

I mean, I thought I knew this before. But... words fail me, so I'll have to show a few examples.

eval and introduced bindings

This probably isn't worth mentioning, as you probably know it, but eval can introduce lexical bindings:

 > var foo = 10;
 > (function (){ eval('var foo=20;'); return foo; })()
 20
 > foo
 10

I find this to be pretty insane already, but I knew about it. You would think though that var x = 10; and eval('var x = 10;'); would be the same, though, but they're not:

 > (function (){ var x = 10; return delete x; })()
 false
 > (function (){ eval('var x = 10;'); return delete x; })()
 true

eval-introduced bindings do not have the DontDelete property set on them, according to the post-hoc language semantics, so unlike proper lexical variables, they may be deleted.

when is eval really eval?

Imagine you are trying to analyze some JavaScript code. If you see eval(...), is it really eval?

Not necessarily.

eval pretends to be a regular, mutable binding, so it can be rebound:

 > eval = print
 > eval('foo')
 foo // printed

or, shadowed lexically:

 > function () { var eval = print; eval('foo'); }
 foo // printed

or, shadowed dynamically:

 > with({'eval': print}) { eval('foo'); }
 foo // printed

You would think that if you can somehow freeze the eval binding on the global object, and verify that there are no with forms, and no statements of the form var eval = ..., that you could guarantee that eval is eval, but that is not the case:

 > Object.freeze(this);
 > (function (x){ return [eval(x), eval(x)]; })('var eval = print; 10')
 var eval = print; 10 // printed, only once!
 10,

(Here the first eval created a local binding for eval, and evaluated to 10. The second eval was actually a print.)

Crazy!!!!

an eval by any other name

So eval is an identifier that can be bound to another value. OK. One would expect to be able to bind another identifier to eval, then. Does that work? It seems to work:

 > var foo = eval;
 > foo('foo') === eval;
 true

But not really:

 > (function (){ var quux = 10; return foo('quux'); } )()
 Exception: ReferenceError: Can't find variable: quux

eval by any other name isn't eval. (More specifically, eval by any other name doesn't have access to lexical scope.)

Note, however, the mere presence of a shadowed declaration of eval doesn't mean that eval isn't eval:

 > var foo = 10
 > (function(x){ var eval = x; var foo = 20; return [x('foo'), eval('foo')] })(eval)
 10,20

Crazy!!!!

strict mode restrictions

ECMAScript 5 introduces "strict mode", which prevents eval from being rebound:

 > (function(){ "use strict"; var eval = print; })
 Exception: SyntaxError: Cannot declare a variable named 'eval' in strict mode.
 > (function(){ "use strict"; eval = print; })
 Exception: SyntaxError: 'eval' cannot be modified in strict mode
 > (function(){ "use strict"; eval('eval = print;'); })()
 Exception: SyntaxError: 'eval' cannot be modified in strict mode
 > (function(x){"use strict"; x.eval = print; return eval('eval');})(this)
 Exception: TypeError: Attempted to assign to readonly property.

But, since strict mode is embedded in "classic mode", it's perfectly fine to mutate eval from outside strict mode, and strict mode has to follow suit:

 > eval = print;
 > (function(){"use strict"; return eval('eval');})()
 eval // printed

The same is true of non-strict lexical bindings for eval:

 > (function(){ var eval = print; (function(){"use strict"; return eval('eval');})();})();
 eval // printed
 > with({'eval':print}) { (function(){ "use strict"; return eval('eval');})() }
 eval // printed

An engine still has to check at run-time that eval is really eval. This crazy behavior will be with us as long as classic mode, which is to say, forever.

Strict-mode eval does have the one saving grace that it cannot introduce lexical bindings, so implementors do get a break there, but it's a poor consolation prize.

in summary

What can an engine do when it sees eval?

Not much. It can't even prove that it is actually eval unless eval is not bound lexically, there is no with, there is no intervening non-strict call to any identifier eval (regardless of whether it is eval or not), and the global object's eval property is bound to the blessed eval function, and is configured as DontDelete and ReadOnly (not the default in web browsers).

But the very fact that an engine sees a call to an identifier eval poisons optimization: because eval can introduce variables, the scope of free variables is no longer lexically apparent, in many cases.

I'll say it again: crazy!!!

Syndicated 2012-01-12 16:34:08 from wingolog

webkittens! lexical scoping is in danger!

The GTK+ WebKittens are on the loose here in Coruña. There's folks here from Red Hat, Motorola, Collabora, and of course Igalia. It's good times; beyond the obvious platitudes of "um, the web is important and stuff" it's good to be in a group that is creating the web experience of millions of users.

My part in that is very small, adding support for block-scoped let and const to JavaScriptCore.

I've made some progress, but it could be going more smoothly. I have made the parser do the right thing for const, correctly raising errors for duplicate bindings, including nested var declarations that get hoisted. The parser is fine: it maintains an environment like you would expect. But the AST assumes that all locals get hoisted to function scope, so there's no provision for e.g. two distinct local variables with the same name. So there is still some work to do on the AST, and it's a thicket of templates.

Hopefully I'll end up with a prototype by the end of the hackfest (Sunday). Sooner if I find that sword of omens, which I seem to have misplaced. Sight beyond sight!

Syndicated 2011-12-02 17:36:56 from wingolog

fscons 2011: free software, free society

Good morning, hackersphere! Time and space are moving, in the egocentric coordinate system at least, but before their trace is gone, I would like to say: FSCONS 2011 was fantastic!

FSCONS is a conference unlike any other I know. I mean, where else can you go from a talk about feminism in free software, to talk about the state of the OpenRISC chip design project, passing through a hallway track conversation on the impact of cryptocurrency on the welfare state, approached from an anarchist perspective?

Like many of you, I make software because I like to hack. But I make Free Software in particular because I value all kinds of freedom, as part of the "more beautiful world our hearts know is possible". We make the material conditions of tomorrow's social relations, and I want a world of sharing and mutual aid.

But when we reflect on what our hands are making, we tend do so in a context of how, not why. That's why I enjoyed FSCONS so much, that it created a space for joining the means of production to their ends: a cons of Free Software, Free Society.

As a GNU hacker, I'm especially honored by the appreciation that FSCONS particpants have for GNU. FSCONS has a tithe, in which a portion of the entry fees is donated to some project, and this year GNU was chosen as the recipient. It's especially humbling, given the other excellent projects that were nominated for the tithe.

So thank you very much, FSCONS organizers and participants. I had a great time!

are you bitter about it?

I gave a talk there at FSCONS, GNU Guile: Free Software Means of Production (slides, notes).

Unlike many of my other talks, this one was aimed at folks that didn't necessarily know very much about Guile. It was also different from other talks in that it emphasized Guile as a general programming environment, not as an extension language. Guile is both things, and as the general-purpose side gets a lot less publicity, I wanted to emphasize it in this talk. Hopefully the videos will be up soon.

In the last 20 minutes or so, we did a live-hack. Inspired by a tweet by mattmight, we built Bitter, a one-bit Twitter. I tried to convey what it's like to hack in Guile, with some success I think. Source code for the live-hack, such as it is, is linked to at the end of the page.

For a slightly more extended example of a web application, check out Peeple, originally presented in a talk at FOSDEM, back in February. Peeple has the advantage of being presented as a development of separate git commits. Slides of that talk, Dynamic Hacking with Guile, are also available, though they are not as developed as the ones from FSCONS.

Finally, for the real documentation, see the Guile manual.

Happy hacking, and hopefully see you at FSCONS next year!

Syndicated 2011-11-28 11:38:36 from wingolog

JavaScriptCore, the Webkit JS implementation

My readers will know that I have recently had the pleasure of looking into the V8 JavaScript implementation, from Google. I'm part of a small group in Igalia doing compiler work, and it's clear that in addition to being lots of fun, JavaScript implementations are an important part of the compiler market today.

But V8 is not the only JS implementation in town. Besides Mozilla's SpiderMonkey, which you probably know, there is another major Free Software JS implementation that you might not have even heard of, at least not by its proper name: JavaScriptCore.

jsc: js for webkit

JavaScriptCore (JSC) is the JavaScript implementation of the WebKit project.

In the beginning, JavaScriptCore was a simple tree-based interpreter, as Mozilla's SpiderMonkey was. But then in June of 2008, a few intrepid hackers at Apple wrote a compiler and bytecode interpreter for JSC, threw away the tree-based interpreter, and called the thing SquirrelFish. This was eventually marketed as "Nitro" inside Apple's products[0].

JSC's bytecode interpreter was great, and is still pretty interesting. I'll go into some more details later in this article, because its structure conditions the rest of the implementation.

But let me continue here with this historical sketch by noting that later in 2008, the WebKit folks added inline caches, a regular expression JIT, and a simple method JIT, and then called the thing SquirrelFish Extreme. Marketers called this Nitro Extreme. (Still, the proper name of the engine is JavaScriptCore; Wikipedia currently gets this one wrong.)

One thing to note here is that the JSC folks were doing great, well-factored work. It was so good that SpiderMonkey hackers at Mozilla adopted JSC's regexp JIT compiler and their native-code assembler directly.

As far as I can tell, for JSC, 2009 and 2010 were spent in "consolidation". By that I mean that JSC had a JIT and a bytecode interpreter, and they wanted to maintain them both, and so there was a lot of refactoring and tweaking to make them interoperate. This phase consolidated the SFX gains on x86 architectures, while also adding implementations for ARM and other architectures.

But with the release of V8's Crankshaft in late 2010, the JS performance bar had been lowered again (assuming it is a limbo), and so JSC folks started working on what they call their "DFG JIT" (DFG for "data-flow graph"), which aims be more like Crankshaft, basically.

It's possible to configure a JSC with all three engines: the interpreter, the simple method JIT, and the DFG JIT. In that case there is tiered compilation between the three forms: initial parsing and compilation produces bytecode, that can be optimized with the method JIT, that can be optimized by the DFG JIT. In practice, though, on most platforms the interpreter is not included, so that all code runs through the method JIT. As far as I can tell, the DFG JIT is shipping in Mac OS X Lion's Safari browser, but it is not currently enabled on any platform other than 64-bit Mac. (I am working on getting that fixed.)

a register vm

The interpreter has a number of interesting pieces, but it is important mostly for defining the format of bytecode. Bytecode is effectively the high-level intermediate representation (IR) of JSC.

To put that into perspective, in V8, the high-level intermediate representation is the JS source code itself. When V8 first sees a piece of code, it pre-parses it to raise early syntax errors. Later when it needs to analyze the source code, either for the full-codegen compiler or for Hydrogen, it re-parses it to an AST, and then works on the AST.

In contrast, in JSC, when code is first seen, it is fully parsed to an AST and then that AST is compiled to bytecode. After producing the bytecode, the source text isn't needed any more, and so it is forgotten. The interpreter interprets the bytecode directly. The simple method JIT compiles the bytecode directly. The DFG JIT has to re-parse the bytecode into an SSA-style IR before optimizing and producing native code, which is a bit more expensive but worth it for hot code.

As you can see, bytecode is the common language spoken by all of JSC's engines, so it's important to understand it.

Before really getting into things, I should make an aside about terminology here. Traditionally, at least in my limited experience, a virtual machine was considered to be a piece of software that interprets sequences of virtual instructions. This would be in contrast to a "real" machine, that interprets sequences of "machine" or "native" instructions in hardware.

But these days things are more complicated. A common statement a few years ago would be, "is JavaScript interpreted or compiled?" This question is nonsensical, because "interpreted" or "compiled" are properties of implementations, not languages. Furthermore the implementation can compile to bytecode, but then interpret that bytecode, as JSC used to do.

And in the end, if you compile all the bytecode that you see, where is the "virtual machine"? V8 hackers still call themselves "virtual machine engineers", even as there is no interpreter in the V8 sources (not counting the ARM simulator; and what of a program run under qemu?).

All in all though, it is still fair to say that JavaScriptCore's high-level intermediate language is a sequence of virtual instructions for an abstract register machine, of which the interpreter and the simple method JIT are implementations.

When I say "register machine", I mean that in contrast to a "stack machine". The difference is that in a register machine, all temporary values have names, and are stored in slots in the stack frame, whereas in a stack machine, temporary results are typically pushed on the stack, and most instructions take their operands by popping values off the stack.

(Incidentally, V8's full-codegen compiler operates on the AST in a stack-machine-like way. Accurately modelling the state of the stack when switching from full-codegen to Crankshaft has been a source of many bugs in V8.)

Let me say that for an interpreter, I am totally convinced that register machines are the way to go. I say this as a Guile co-maintainer, which has a stack VM. Here are some reasons.

First, stack machines penalize named temporaries. For example, consider the following code:

(lambda (x)
  (* (+ x 2)
     (+ x 2)))

We could do common-subexpression elimination to optimize this:

(lambda (x)
  (let ((y (+ x 2)))
    (* y y)))

But in a stack machine is this really a win? Consider the sequence of instructions in the first case:

; stack machine, unoptimized
0: local-ref 0      ; x
1: make-int8 2
2: add
3: local-ref 0      ; x
4: make-int8 2
5: add
6: mul
7: return

Contrast this to the instructions for the second case:

; stack machine, optimized
0: local-ref 0      ; push x
1: make-int8 2      ; push 2
2: add              ; pop x and 2, add, and push sum
3: local-set 1      ; pop and set y
4: local-ref 1      ; push y
5: local-ref 1      ; push y
6: mul              ; pop y and y, multiply, and push product
7: return           ; pop and return

In this case we really didn't gain anything, because the storing values to locals and loading them back to the stack take up separate instructions, and in general the time spent in a procedure is linear in the number of instructions executed in the procedure.

In a register machine, on the other hand, things are easier, and CSE is definitely a win:

0: add 1 0 0           ; add x to x and store in y
1: mul 2 1 1           ; multiply y and y and store in z
2: return 2            ; return z

In a register machine, there is no penalty to naming a value. Using a register machine reduces the push/pop noise around the instructions that do the real work.

Also, because they include the names (or rather, locations) of their operands within the instruction, register machines also take fewer instructions to do the job. This reduces dispatch cost.

In addition, with a register VM, you know the size of a call frame before going into it, so you can avoid overflow checks when pushing values in the function. (Some stack machines also have this property, like the JVM.)

But the big advantage of targeting a register machine is that you can take advantage of traditional compiler optimizations like CSE and register allocation. In this particular example, we have used three virtual registers, but in reality we only need one. The resulting code is also closer to what real machines expect, and so is easier to JIT.

On the down side, instructions for a register machine typically occupy more memory than instructions for a stack machine. This is particularly the case for JSC, in which the opcode and each of the operands takes up an entire machine word. This was done to implement "direct threading", in which the opcodes are not indexes into jump tables, but actually are the addresses of the corresponding labels. This might be an acceptable strategy for an implementation of JS that doesn't serialize bytecode out to disk, but for anything else the relocations are likely to make it a lose. In fact I'm not sure that it's a win for JSC even, and perhaps the bloat was enough of a lose that the interpreter was turned off by default.

Stack frames for the interpreter consist of a six-word frame, the arguments to the procedure, and then the locals. Calling a procedure reserves space for a stack frame and then pushes the arguments on the stack -- or rather, sets them to the last n + 6 registers in the stack frame -- then slides up the frame pointer. For some reason in JSC the stack is called the "register file", and the frame pointer is the "register window". Go figure; I suppose the names are as inscrutable as the "activation records" of the stack world.

jit: a jit, a method jit

I mention all these details about the interpreter and the stack (I mean, the register file), because they apply directly to the method JIT. The simple method JIT (which has no name) does the exact same things that the bytecode interpreter does, but it does them via emitted machine instructions instead of interpreting virtual instructions.

There's not much to say here; jitting the code has the result you would expect, reducing dispatching overhead, while at the same time allowing some context-specific compilation, like when you add a constant integer to a variable. This JIT is really quick-and-dirty though, so you don't get a lot of the visibility benefits traditionally associated with method JITs like what HotSpot's C1 or C2 currently have. Granted, the register VM bytecode does allow for some important optimizations to happen, but JSC currently doesn't do very much in the way of optimizing bytecode, as far as I can tell.

Thinking more on the subject, I suspect that for Javascript, CSE isn't even possible unless you know the types, as a valueOf() callback could have side effects.

an interlude of snarky footnotes

Hello, reader! This is a long article, and it's a bit dense. I had some snarky footnotes that I enjoyed writing, but it felt wrong to put them at the end, so I thought it better to liven things up in the middle here. The article continues in the next section.

0. In case you didn't know, compilers are approximately 37% composed of marketing, and rebranding is one of the few things you can do to a compiler, marketing-wise, hence the name train: SquirrelFish, Nitro, SFX, Nitro Extreme...[1] As in, in response to "I heard that Nitro is slow.", one hears, "Oh, man they totally fixed that in SquirrelFish Extreme. It's blazingly fast![2]"

1. I don't mean to pick on JSC folks here. V8 definitely has this too, with their "big reveals". Firefox people continue to do this for some reason (SpiderMonkey, TraceMonkey, JaegerMonkey, IonMonkey). I expect that even they have forgotten the reason at this point. In fact the JSC marketing has lately been more notable in its absence, resulting in a dearth of useful communication. At this point, in response to "Oh, man they totally are doing a great job in JavaScriptCore", you're most likely to hear, "JavaScriptCore? Never heard of it. Kids these days hack the darndest things."

2. This is the other implement in the marketer's toolbox: "blazingly fast". It means, "I know that you don't understand anything I'm saying, but I would like for you to repeat this phrase to your colleagues please." As in, "LLVM does advanced TBAA on the SSA IR, allowing CSE and LICM while propagating copies to enable SIMD loop vectorization. It is blazingly fast."

dfg: a new crankshaft for jsc?

JavaScriptCore's data flow graph (DFG) JIT is work by Gavin Barraclough and Filip Pizlo to enable speculative optimizations for JSC. For example, if you see the following code in JS:

a[i++] = 0.7*x;

then a is probably an array of floating-point numbers, and i is probably an integer. To get great performance, you want to use native array and integer operations, so you speculatively compile a version of your code that makes these assumptions. If the assumptions don't work out, then you bail out and try again with the normal method JIT.

The fact that the interpreter and simple method JIT have a clear semantic model in the form of bytecode execution makes it easy to bail out: you just reconstruct the state of the virtual registers and register window, then jump back into the code. (V8 calls this process "deoptimization"; the DFG calls it "speculation failure".)

You can go the other way as well, switching from the simple JIT to the optimized DFG JIT, using on-stack replacement. The DFG JIT does do OSR. I hear that it's needed if you want to win Kraken, which puts you in lots of tight loops that you need to be able to optimize without relying on being able to switch to optimized code only on function re-entry.

When the DFG JIT is enabled, the interpreter (if present) and the simple method JIT are augmented with profiling information, to record what types flow through the various parts of the code. If a loop is executed a lot (currently more than 1000 times), or a function is called a lot (currently about 70 times), the DFG JIT kicks in. It parses the bytecode of a function into an SSA-like representation, doing inlining and collecting type feedback along the way. This probably sounds very familiar to my readers.

The difference between JSC and Crankshaft here is that Crankshaft parses out type feedback from the inline caches directly, instead of relying on in-code instrumentation. I think Crankshaft's approach is a bit more elegant, but it is prone to lossage when GC blows the caches away, and in any case either way gets the job done.

I mentioned inlining before, but I want to make sure that you noticed it: the DFG JIT does do inlining, and does so at parse-time, like HotSpot does. The type profiling (they call it "value profiling") combined with some cheap static analysis also allows the DFG to unbox int32 and double-precision values.

One thing that the DFG JIT doesn't do, currently, is much in the way of code motion. It does do some dead-code elimination and common-subexpression elimination, and as I mentioned before, you need the DFG's value profiles in order to be able to do this correctly. But it does not, as far as I can tell, do much in the way of code motion, like loop-invariant code motion.

Also, the DFG's register allocator is not as good as Crankshaft's, yet. It is hampered in this regard by the JSC assembler that I praised earlier; while indeed a well-factored, robust piece of code, JSC's assembler has a two-address interface instead of a three-address interface. This means that instead of having methods like add(dest, op1, op2), it has methods like add(op1, op2), where the operation implicitly stores its result in its first operand. Though it does correspond to the x86 instruction set, this sort of interface is not great for systems where you have more registers (like on x86-64), and forces the compiler to shuffle registers around a lot.

The counter-based optimization triggers do require some code to run that isn't strictly necessary for the computation of the results, but this stratey does have the nice property that the DFG performance is fairly predictable, and measurable. Crankshaft, on the other hand, being triggered by a statistical profiler, has statistically variable performance.

And speaking of performance, AWFY on the mac is really where it's at for JSC right now. Since the DFG is only enabled by default on recent Mac OS 64-bit builds, you need to be sure you're benchmarking the right thing.

Looking at the results, I think we can say that JSC's performance on the V8 benchmark is really good. Also it's interesting to see JSC beat V8 on SunSpider. Of course, there are lots of quibbles to be had as to the validity of the various benchmarks, and it's also clear that V8 is the fastest right now once it has time to warm up. But I think we can say that JSC is doing good work right now, and getting better over time.

future

So that's JavaScriptCore. The team -- three people, really -- is mostly focusing on getting the DFG JIT working well right now, and I suspect they'll be on that for a few months. But we should at least get to the point where the DFG JIT is working and enabled by default on free systems within a week or two.

The one other thing that's in the works for JSC is a new generational garbage collector. This is progressing, but slowly. There are stubs in the code for card-marking write barriers, but currently there is no such GC implementation, as far as I can tell. I suspect that someone has a patch they're cooking in private; we'll see. At least JSC does have a Handle API, unlike SpiderMonkey.

conclusion

So, yes, in summary, JavaScriptCore is a fine JS implementation. Besides being a correct implementation for real-world JS -- something that is already saying quite a lot -- it also has good startup speed, is fairly robust, and is working on getting an optimizing compiler. There's work to do on it, as with all JS implementations, but it's doing fine.

Thanks for reading, if you got this far, though I understand if you skipped some parts in the middle. Comments and corrections are most welcome, from those of you that actually read all the way through, of course :). Happy hacking!

Syndicated 2011-10-28 15:51:24 from wingolog

the user in the loop

The videos from this year's GNU Hackers Meeting in Paris are up. All the videos are linked from that page. There were technical problems with a couple of them, and we didn't get BT Templeton's presentation on Emacs Lisp in Guile (using delimited dynamic bindings to implement buffer-local variables!), but all in all it's a good crop.

I gave a talk entitled "The User in the Loop" that argued for the place of extensibility in the GNU project. I also pointed out some ways in which Guile fulfills those needs, but that was not the central point to the talk. I argued that point at more length in a previous article.

Anyway, let's give that newfangled HTML5 video tag a go here. It starts with the cut-off statement, "We all know that code motion is very important to efficiency, so..."

Alternately you can download the video directly (~350MB, 52 minutes). There are notes too, a superset of the slides from the talk.

Syndicated 2011-10-19 14:32:34 from wingolog

what's your C migration plan?

I spend a lot of time in my neighborhood café. It has character. Some of it is generic: like in many other bars, the round tables are marble, the chairs are rickety, the customers and the barmen all shout at each other.

But it has its own thing. The owner inherited it from his father, but he was never sure whether he wanted a record store or a bar, so it's half and half. In practice the CDs in the changer don't change very often, and the rest are all in a side room that is otherwise packed full of junk.

In the morning, everyone gets tallats. Old ladies get theirs with pastries. Old men get theirs amb gotes, with liquor. Morning workers rush in, slam one back, slap coins on the counter and get out. Adéu!

Those with more time, like myself, get a cafè amb llet. Most read the paper. I hack. I spend a couple mornings a week there. It's right pleasant to work there, without internet. My most productive hours of the week are there in the bar.

I do chitchat a bit, though: with designers, mothers, the barmen, retirees, random folk. There's one fellow in particular I like talking with, a designer. Turns out he wants to learn how to program. He told me yesterday that he wanted to learn C.

on c

Now, I spend a lot of time in C. I've written loads of it. I continue to write it at times. I'm fond of it, and it has served me well.

But I have come to believe that writing new programs in C is the hacker equivalent of malpractice. C is just too dangerous to use. The risks are not worth the benefits.

Let's be clear about the benefits of writing in C, before looking at its flaws. I want to be really fair here. C is fast. It has great compilers, everywhere. And they are everywhere: C is ubiquitous. It is very flexible, also. You can do anything in C. It is great for programming drivers. It is possible to write big e-mail clients in it. It has great low-level bit operations (though C++ is getting better at it, with its value types). It is a power tool, and it puts you in control. There are loads of libraries written in it, and for it. It encourages minimalism.

In short, C has been a wild success, and with good reason.

But it's time to stop using it.

The essence of the problem with C is that the behavior of erroneous programs is not specified. This doesn't sound like a big deal, but it is. Let's make an example. What is the meaning of this Python program?

def array_set(a, n, x):
  a[n] = x

I think we would all say that it sets the nth element of a to x. But what about this one in C:

void array_set (void **a, size_t n, void *x) {
  a[n] = x;
}

This one, you really can't tell. You can't use Python's definition, because there are no guarantees about the valid indices to set in the array. In fact there is no array, just a pointer.

In the Python case, if you pass an out-of-bounds array index, you will get an exception. In the C case, you get undefined behavior. Lately what "undefined behavior" means is that the state or organized crime gets to take control of your computer: read your email, log your keystrokes, take pictures of you.

I'm deadly serious. Let's take a better example. Last year Liu Xiaobo won the Nobel Peace Prize for "his long and non-violent struggle for fundamental human rights in China." A couple weeks later, it was found that someone cracked the Nobel Prize website and was serving a Firefox zero-day that took over users' machines.

Apparently, someone was interested in what was on the computers of visitors to the Nobel prize site -- interested enough to use a fresh zero-day, something that probably sells on the black market for some $20,000 or more.

Lately I've been quoting Eben Moglen a lot, especially from the talk he gave at this year's FOSDEM conference. One thing he said there is that "software is the steel of the 21st century". As steel shaped the social relations of the last century, what we hack now forms the material conditions of tomorrow: the next Arab Spring, or the next Green Scare.

Almost all software is connected to the net these days, and so it is all under attack. If you write software in C these days -- software with bugs, like any software -- you are writing software with undefined behavior, and thus, software that enables powerful state and organized crime actors to take advantage of your users.

That sounds a bit exaggerated, no? It is true, though. Look at what is happening with browser vulnerabilities, or what is the same, PDF or Flash vulnerabilities, or PNG or MP4, or what-have-you. The commonality here is that powerful interests exploit unsuspecting users due to flaws in the C and C++ language. Writing more C these days is malpractice.

migration

I still write C. I work on implementations of safe languages -- languages that don't have the same kinds of fundamental vulnerabilities that C and C++ have. Eventually the amount of C in the world will stop growing, and decline as pieces that are now written in C will be written in Python, in JavaScript, in Guile: in short, in languages that don't launch the missiles when you try to write beyond the end of an array.

C has had a great run; we should celebrate it. But its time has passed. What is your migration strategy? How are you going to stop writing C?

Syndicated 2011-10-13 20:44:55 from wingolog

partial evaluation in guile

Friends, something awesome just happened: Guile just got itself a respectable inliner.

I have said before on this blog, quoting commenter Rémi Forax, that "inlining is the mother of all optimizations". It is true that inlining opens up space for code motion, constant folding, dead code elimination, and loop optimizations. However, credit might be better laid at the feet of partial evaluation, the mother of all inlining algorithms.

Partial evaluation is a source-to-source transformation that takes your program and produces a better one: one in which any computation that can be done at compile-time is already made, leaving only those computations that need to done at run-time.

For example, the application

(+ 2 3)

can clearly be evaluated at compile-time. We say that the source expression (+ 2 3) reduces to 5 via constant folding. The result, 5 in this case, is the residual expression.

A more complicated example would look like:

(let ((string->chars
       (lambda (s)
         (define char-at
           (lambda (n) (string-ref s n)))
         (define len
           (lambda () (string-length s)))
         (let loop ((i 0))
           (if (< i (len))
               (cons (char-at i)
                     (loop (1+ i)))
               '())))))
  (string->chars "yo"))
=> (list #\y #\o)

Here when I write =>, you should read it as, "residualizes at compile-time to". In this case our input program residualized, at compile-time, to a simple list construction. The loop was totally unrolled, the string-refs folded, and all leaf procedures were inlined.

Neat, eh?

optimization enables expressiveness

If the partial evaluator does its job right, the residual program will run faster. However this isn't the real reason that I'm so pleased with it; rather, it's that it lets me write different programs.

You see, I hack on Guile's compiler and VM and all that. When I write code, I know what Guile is going to do with it. Unfortunately, this caused my programs to be uglier than necessary, because I knew that Guile wasn't going to inline some important things for me. I wrote at a lower level of abstraction, because I couldn't trust the compiler.

Now, with the partial evaluator, I'm happy to use helper functions, even higher-order helpers, with the knowledge that Guile will mostly do the right thing. This is particularly important in the context of languages that support syntactic abstraction, like Scheme. If you're a Schemer and haven't seen Kent Dybvig's Macro Writers' Bill of Rights talk (slides), do check it out.

Incidentally, there was a sad moment in JSConf.eu a couple weekends ago when Andreas Gal (of all people!) indicated that he had to manually inline some functions in PDF.js in order to get adequate speed. More on JavaScript a little later, though.

about partial evaluation

A partial evaluator looks a lot like a regular meta-circular evaluator. It's a recursive function that takes an expression and an environment and yields a value. Guile's partial evaluator, peval, builds up lexical environments when it sees let and other binding constructs, and tries to propagate copies when it sees lexical references.

Inlining is facilitated by copy-propagation of lambda expressions. Just as the initial value 0 in the example above propagates through the lexical variable i to reach (< i (len)), (lambda () (string-length s)) propagates to len. Application of a lambda expression reduces to the equivalent of a let binding. So for the first iteration of loop above, we have:

(< i (len))
;; copy propagation
=> (< 0 ((lambda () (string-length s))))
;; beta-reduction
=> (< 0 (string-length s))
;; copy-propagation
=> (< 0 (string-length "yo"))
;; constant-folding
=> (< 0 2)
;; constant-folding
=> #t

In this case the condition folded to a constant, so we know at compile-time which branch to take. The second branch is dead, so we eliminate it. The process continues until we finally produce the resulting list.

down the rabbit hole

Up to here things are easy: we have a simple, well-typed example that terminates. But to be part of a real-world compiler, a partial evaluator needs to handle real-world code: accessors for mutable data, access to mutable bindings (lexical and global), indefinite recursion, unbound variables, and poorly-typed programs. In addition, a real-world inliner needs to run quickly and avoid producing bloated residual code.

I should take a moment and note that statically-typed, functional languages can avoid a number of these problems, simply by defining them away. It is no wonder that compiler people tend towards early binding. Scheme does exhibit a fair amount of early binding through its use of lexical scope, but it is not a pure functional language. Working on this peval was the first time that I wished for immutable pairs in Scheme, as promoted by Racket and R6RS.

Anyway, having mutability in your language isn't so bad. You do miss some optimization opportunities, but that is OK. What is not OK in a production peval is spending too much time on an expression.

Guile's solution, following Waddell and Dybvig's excellent Fast and Effective Procedure Inlining, is to simply count the number of times through the inliner. Each inlining attempt gets a fresh counter, and any work performed within an inlining attempt decrements the counter. When the counter reaches zero, the inlining attempt is aborted, and a call is residualized instead. Since the number of call sites in the program is fixed, and there is a maximum amount of work that will be done at each call site, the resulting algorithm is O(N) in the size of the source program.

Guile's partial evaluator also uses the on-demand, online strategy of Waddell and Dybvig, to allow definitions to be processed in their use contexts. For example, (cons 1 2) may be reduced to #t when processed as a test, in a conditional. If, after processing the body of a let, a binding is unreferenced, then it is processed for effect. Et cetera.

With the effort counter in place, Guile simply tries to inline every call site in the program, knowing that it will bail out if things don't work. It sounds a little crazy, but it works, as Waddell and Dybvig show. The effort counter also serves to limit code growth, though it is a bit crude. In any case I got less than a percent of code growth when optimizing the psyntax expander that Guile uses, which is a win in my book.

caveats

Partial evaluation can only propagate bindings whose definitions are known. In the case of Guile, then, that restricts inlining to lexical references and primitive references, and notably excludes global references and module imports, or fields of mutable objects. So this does not yet give us cross-module inlining, beyond the hacks that abuse the macro expander.

This observation has a correlary, in that some languages promote a style of programming that is difficult to analyze. I'm really talking about object-oriented languages here, and the dynamic ones in particular. When you see o.foo() in Java, there is at least the possibility that foo is a final method, so you know you can inline it if you choose to. But in JavaScript if you see o.foo(), you don't know anything: the set of properties of o can and does vary at runtime as people monkey-patch the object o, its prototype, or Object.prototype. You can even change o.__proto__ in most JS implementations. Even if you can see that your o.foo() call is dominated by a o.foo = ... assignment, you still don't know anything in ES5, as o could have a setter for the foo property.

This situation is mitigated in the JavaScript world by a couple of things.

First of all, you doesn't have to program this way: you can use lexical scoping in a more functional style. Coupled with strict mode, this allows a compiler to see that a call to foo can be inlined, as long as foo isn't mutated in the source program. That is a property that is cheap to prove statically.

However, as Andreas Gal found out, this isn't something that the mainstream JS implementations do. It is really a shame, and it has lasting impacts on programmers.

I even heard a couple people say that in JS, you should avoid deep lexical bindings, because the access time depends on the binding depth. While this is true for current implementations, it is a property of the implementations and not of the language. Absent with and eval-introduced bindings, a property that is true in strict-mode code, it is possible to quickly compute the set of free variables for every function expression. When the closure is made, instead of grabbing a handle on some sort of nested scope object, a JS implementation can just copy the values of the free variables, and store them in a vector associated with the function code. (You see, a closure is code with data.) Then any accesses to those variables go through the vector instead of the scope.

For assigned variables -- again, a property that can be proven statically -- you put the variables in a fresh "box", and rewrite accesses to those variables to go through that box. Capturing a free variable copies the box instead of its value.

There is nothing new about this technique; Cardelli and Dybvig (and probably others) discovered it independently in the 80s.

This point about closure implementation is related to partial evaluation: people don't complain much about the poor static inliners of JS, because the generally poor closure implementations penalize lexical abstraction. Truly a shame!

* * *

It seems I have digressed. Sorry about that!

I spoke about closures and lexical scope, properties of the JS language that can enable static inlining. The second (and more important) way that JS implementations can support inlining is dynamically. I trolled about that some months ago. Dynamic inlining is fantastic, when it works, though there are limiting heuristics (scroll down to "inlining", and note that the exact set of heuristics have changed in the intervening months).

So my last point was about something that Guile does well that JS implementations do poorly, and it's fair that this point should be the reverse. I would like to be able to dynamically inline, but this would mean associating the intermediate representation with Scheme functions. As Guile can compile code ahead-of-time, this means we would have to serialize the IR out to disk, in much the same way as GCC's new link-time optimizer (LTO) does. But I would like to put that off until we change the format of compiled Guile code to be ELF. Otherwise we run the risk of bloating our runtime memory size.

try it out

Guile's partial evaluator was joint work between myself and my fellow Guile maintainer Ludovic Courtès, and was inspired by a presentation by William Cook at DSL 2011, along with the Waddell and Dybvig's Fast and Effective Procedure Inlining.

This code is currently only in the development Guile tree, built from git. Barring problems, it will be part of Guile 2.0.3, which should be out in a couple weeks.

You can check out what the optimizer does at the command prompt:

>,optimize (let ((x 13)) (* x x))
$1 = 169
>,optimize (let ((x 13)) (* x foo))
$2 = (* 13 foo)

Have fun, and send bugs to bug-guile@gnu.org.

Syndicated 2011-10-11 10:01:30 from wingolog

a schemer at jsconf.eu

Yow! I am just back from JSConf.eu, a lovely & lively gathering of JavaScript hackers in sunny Berlin. And it was sunny, this foreign land, and foreign indeed: as my long-time readers will know, I am not a JavaScript hacker, but a Schemer with a C problem. JSConf.eu was a pleasant place to visit.

It turns out that I wasn't the only one there in the tourist condition; all of the other implementors seemed to be in a similar situation. The Mozilla folks look at themselves as "platform engineers". V8 hacker Vyacheslav Egorov has a thing for Oberon; I'm sure he'll tell you about it if you ask. And so on. Indeed I think that of the folks that were there, Brendan is the only one implementing JavaScript in JavaScript. Do You Speak It!

great job, organizers!

But besides giving me the opportunity to gossip about garbage collection and type feedback, JSConf.eu was a well-organized, energetic event, and I say that as one who has been to dozens of conferences in the last few years.

There was great sound and video, of course. It looked like they did a pretty good job with the recording, so the videos should be online at some point. In the downstairs room, they had Anna Lena Schiller drawing live "graphic recordings" as the speakers spoke, on large sheets of glossy paper, in color, illustrating the main ideas of the talks. As the weekend progressed, all of these pieces were mounted on the wall for people to see. In the end they will be scanned and uploaded, and the originals mailed to the speakers. It was amazing, and an impressive amount of work.

It's silly, but let me also mention the food: there were fresh-made frozen-yogurt smoothies with brownie toppings. There were fresh-made croissants and bread, and a plate of some dozen or so French cheeses, and made-to-order breakfast, and plates of tarts, and the bar was open all day long, with apfelschorle and club-mate and beer and coffee, and there was an espresso stand, and everything was free. There was lunch and dinner at the venue and free parties before, during, and after. The only caveat was that the vegetarian options were relatively few. Still, though way to raise the bar, JSConf.eu!

hypersensitivity?

I could be overly sensitive about this, but I would like to put one thing out there. There were a number of sexuality-related aspects of this event that I did not feel comfortable with. There were a couple speakers that made live-chat applications with node.js, and demoed them to the room, and a few anonymous penis jokes ended up on them. There was a speaker that made a sexual joke about Allow: GET, HEAD in HTTP headers. Most of all though, they had the (very talented) caberet singer Mandy Lauderdale there, who started with a "Brendan Eich Fanclub" thing, then put on a show the next night at the party.

Please understand me when I say that I have no problem with sexuality in general -- it is fun, subversive, and in any case a part of life. However in the programming world we have a gender-balance problem, and I don't think that incorporating sexuality into our culture helps that. In the particular case of Mandy's (talented and sometimes funny) performance, I get the feeling that we're promoting a "boy's club" atmosphere where it becomes OK to make penis jokes.

Like I said, I'm just putting that out there. Feel free to disagree, politely of course.

talks

The first talk I saw was by Dean McNamee of plask, a non-browser JS environment for making art. Plask helps make graphical art by providing 2D and 3D drawing primitives to JS. It also integrates with other devices via MIDI and OSC, and can be sequenced over MIDI (or OSC, I presume).

Dean's message was that we need to stop caring so much about the "how" and care more about the "what" and "when". There was no code in the presentation, only demos of what Plask has done; and indeed Dean was quite honest that it was the crappiest code he has ever written. But the demos were pretty awesome.

There are obviously some problems with that approach to software, that it is difficult to share or to form a piece of someone else's work. However it is equally true that many of us spend a lot of time working on things that people will never see, and that sometimes we should just take shortcuts and move on.

Another interesting talk was by Peter van der Zee on JavaScript tools. Peter wrote a JS parser in JS, and an in-browser editor to overlay that tool information and utility onto a piece of JS code. It can provide many warnings, do type inference, and similar things. It can do some automated refactorings, like var hoisting, extract-method, and minification. I wonder if some more principled analysis algorithms like kCFA would provide any features above what he is able to do with his fixed-point constraint solver. Still, neat stuff; check it out on github.

David Mandelin and David Anderson of Mozilla gave a nice introduction to what JIT compilers can do these days in JavaScript, at least at a very local level. The presentation reminded me of Mandelin's old pjit project, to figure out what sorts of optimizations would be necessary to yield the smallest assembly code. My only criticism is that some of their assumptions were specific to the nan-boxing approach. They did include a humble nod at the end that there was still a ways for IonMonkey to go in order to reach what Crankshaft does, which was appropriate IMO.

Brendan Eich gave a talk at the end of the day about the coming ES6. Frankly I have no idea how people could understand the things he spoke of without following es-discuss, and most people I asked afterwards were really skeptical about such things as the "monocle-mustache" operator (that's .{ for those of you following along at home). That said, I do like the triangle operator (|>) and the potential block-lambda syntax. Slides are not up yet, unfortunately.

. . .

The next day, I was pleased to find myself sitting with GNOME hackers Henri Bergius and Garrett LeSage. We all agreed that it was great to be in such a dynamic, energetic conference, and that JSConf.eu was much more lively than recent free software conferences we had been to. However it must be said that JS programmers are at an early stage of free software development in many ways, in that there are many more single-person projects than we have in GNOME (for example). It could be said that our use of C in GNOME forces you to collaborate because you can't build very much (hah!), but large, stable projects with many contributors are relatively rare.

Also, I thought there is a general low esteem among the JS folk for copyleft and privacy, and for software freedom in general. There are exceptions of course, but JS is closely tied to environments with lots of venture capital, so it is to be expected that we get divergence of interests between users and developers. I think this is an area in which GNU folk have something important to say: that it's not just about the tech, it's about the freedom too.

On that note, my Igalia colleague Eduardo Lima's talk on JavaScript in GNOME was well-received. Eduardo talked about the pieces that allow a good JavaScript hacking experience to be had in GNOME, and even did some live-coding. Hopefully he'll blog about his talk soon :-)

I really enjoyed Tom Robinson's talk about compilers and JavaScript. It was about compilers that produce or consume JavaScript. I suppose that in some ways compiling DSLs to JavaScript in an in-browser compiler that passes the result to eval is somewhat like macros, though it is hard to compose with other parts of the target language like lexical scope. Interesting stuff though, and I hope he puts up his slides soon.

Finally, and we're getting to the end here -- I did not intend to write so much! -- V8 hacker Erik Corry had a great talk about garbage collection. V8 always had a generational, moving collector, which is great for most purposes, but struggles with large heaps, because collecting the old generation required a stop-the-world mark and sweep. Their new GC can incrementally collect the old generation, leading to reductions in pause times and higher sustained allocation rates for some problems (like the splay benchmark). There's lots of tricky write-barrier stuff going on there, and they're just now working out the details, but it will arrive to a Chrome near you shortly.

As part of Erik's work, he had enhanced the V8 engine to output heap profiling information on a TCP port. He could then run a node.js process to consume this information and output a graphical representation of the heap in a browser. It shows the new space filling up, scavenging, and promoting data into various parts of the old space. It also shows the old space being incrementally marked and swept, and has a visualization of time spent in the collector versus the mutator. This last visualization nicely showed that the new GC allows the splay benchmark to make forward progress even with a large rate of tenuring into the old generation.

I'm sure the Google folks will make their press release about this as soon as things settle down, but for now you can check it out in the bleeding_edge of V8. Have fun!

fin

A gist of all the slides is up on github, and I suppose it will be updated as new slides are uploaded and videos come online.

There were a number of other talks I didn't get to mention, like Andreas Gal on PDF.js, or Alon Zakai on Emscripten. SQLite compiled from C to JS for the browser? What a world. Thanks again to the JSConf.eu folks for such a great conference, and to Igalia for sending me. Happy hacking!

Syndicated 2011-10-04 18:23:09 from wingolog

from ssa to native code: v8's lithium language

So, friends, about a month ago I posted a look at the Hydrogen intermediate language, as part of an ongoing series on the internals of the V8 JavaScript implementation. Hydrogen is the high-level part of V8's Crankshaft optimizing compiler, where the optimizations happen. This article tells the story of Lithium, Crankshaft's low-level, machine-dependent intermediate language.

totally metal

So here's the deal: you parsed your JavaScript functions to AST objects (parse all the functions!). You emitted some simple native code, quickly but not optimized, and you sat back and waited. After a little while, your profiling told you that some function was hot, so you re-parsed it, generated a graph of Hydrogen instructions and optimized that graph. Now you need to generate code. And that's where we are! Generate all the code!

But that's not what V8 does here. Instead of going directly to native code, it translates the HGraph to a chunk of Lithium code, then emits machine code from those Lithium instructions. Unlike Hydrogen, which is an SSA form in which instructions implicitly define values, the Lithium form is closer to three-address code, with labels and gotos, and explicitly names its operands (if any) and results (if any).

These temporary values are initially declared with a number of constraints, such as "must be in a double-precision register". Later, a register allocator considers the constraints and liveness ranges, allocating each value to a specific register. Code generation happens after register allocation.

an example

Let us take a look at an example to see how these things interact. Given that Lithium is target-specific, we'll just choose one of the targets; ia32 and x64 are the most advanced, so we'll just take x64. (That's what they call it.)

LInstruction* LChunkBuilder::DoAdd(HAdd* instr) {
  if (instr->representation().IsInteger32()) {
    ASSERT(instr->left()->representation().IsInteger32());
    ASSERT(instr->right()->representation().IsInteger32());
    LOperand* left =
        UseRegisterAtStart(instr->LeastConstantOperand());
    LOperand* right =
        UseOrConstantAtStart(instr->MostConstantOperand());
    LAddI* add = new LAddI(left, right);
    LInstruction* result = DefineSameAsFirst(add);
    if (instr->CheckFlag(HValue::kCanOverflow)) {
      result = AssignEnvironment(result);
    }
    return result;
  } else if (instr->representation().IsDouble()) {
    return DoArithmeticD(Token::ADD, instr);
  } else {
    ASSERT(instr->representation().IsTagged());
    return DoArithmeticT(Token::ADD, instr);
  }
  return NULL;
}

As an aside, this is fairly typical for Crankshaft code: tight, lots of asserts, custom type predicates instead of RTTI, self-explanatory enumerated values, and much more oriented towards function calls than in-place mutations. The new objects are created in a "zone", under the covers, so that once the compiler is done, it can free the whole zone at once. I reflowed the left and right lines; otherwise it really is a paragraph of text.

Unfortunately there are not a lot of big-picture comments, hence this blog series.

Anyway, here we have three cases. The first two are for when the HValue has been has been given an unboxed representation, as an int32 or as a double. The last is for tagged values, which can be small integers (Smi values) or heap-allocated doubles. In these latter cases, we dispatch to the DoArithmeticD and DoArithmeticT helpers, which create generic LArithmeticD and LArithmeticT instructions. But we see that in the first case, there is a special LAddI instruction.

When there is a special case in Crankshaft, it either means that we are working around some strange corner of JavaScript, or that something is being optimized. In this case, it is an optimization, reflecting the importance of integer addition. The concrete optimizations are:

  1. Immediate operands. If one of the addends is a constant at compile-time, it can be encoded into the instruction stream directly.

  2. Overflow check removal. If the range analysis (see my previous article) showed that the value would not overflow, then no overflow check is emitted. (Otherwise, an overflow check will be inserted, branching to a deoptimization bailout. The bailout will need to know what variables are live, in that case, hence the AssignEnvironment call; more on that later.)

The UseRegisterAtStart helper call indicates that the operand should be in a register, and that it will be live at the start of the instruction but not the end. This latter constraint is needed because x86 instructions typically clobber the first operand. UseRegisterOrConstant only allocates a register if the operand is not constant and so cannot be immediate. Finally DefineSameAsFirst constrains the output register to be the same as the first operand. All of these calls additionally inform the register allocator of the uses and definitions, in SSA parlance, so that the allocator can build accurate liveness ranges.

Just to be complete, the double arithmetic case (not shown) is bit more straightforward, in that there are no overflow checks or immediates. Tagged arithmetic, on the other hand, always dispatches to an inline cache. (I find this to be fascinating. Typically language implementations inline fixnum operations, but V8 does not because their inline caches also perform type feedback, allowing unboxing optimizations where possible. In V8, tagged integer arithmetic is never inlined, by default.)

allocate all the registers?

Why bother with this Lithium thing? Why not just generate code directly? The answer is that having a target-specific Lithium language allows target-specific code generation and temporary allocation, while permitting the register allocator itself to be generic. As an example, see the ARM DoAdd implementation.

Most everyone knows, but the idea of register allocation is to store your temporaries in machine registers instead of in memory, so that things run fast. It's tricky though, because nested procedure calls can trash your registers, so you need to "spill" them to memory in certain cases. You also need to avoid thrashing values between registers and memory. V8 also tries to use registers when calling internal runtime helpers, which can result in a fair amount of register shuffling.

That's all traditional compiler stuff. But in a garbage-collected language, we also have the complication of knowing which registers hold garbage-collected values and which don't. For every point in the code in which GC could run, the collector will need to mark those pointers, and possibly relocate them. This complicates the allocator.

Another complication is that besides the normal entry point, optimized functions often have an additional entry point, due to on-stack replacement. That entry point has to take values from memory and pack them into registers in a way that lets the computation proceed.

Finally, the allocator plays a role in permitting so-called "lazy deoptimization". V8 hacker Søren Gjesse writes, in a rare but pleasantly exegetic message to the list:

Lazy deoptimization is what happens when all code in a context is forced to be deoptimized either due to starting debugging or due to a global assumption on the optimized code that no longer holds. The reason for the term "lazy" is that actual deoptimization will not happen until control returns to the optimized function. The way we do this lazy deoptimization is somewhat fragile and we have had quite a few bugs in that code. It is based on destructively patching the optimized code placing a call just after each call forcing deoptimization. And actually the patching is not right after the return point but after the gap code inserted at the return point. For the deferred stack check the code up to restoring the registers (popad in IA-32) is considered the gap code.

gaps?

The gap code that Søren referred to is the set of moves that shuffles registers back into place after a call. Translation of Hydrogen to Lithium inserts LGap instructions between each LInstruction. When the allocator decides that it needs to spill, restore, or shuffle some registers, it records these moves on an LGap.

A later pass serializes these parallel moves into a specific order, trying to use the minimum number of temporaries. I link to Xavier Leroy's paper above, not because V8 uses Coq, but because of the nice introduction to the problem.

Anyway, the actual algorithm used by the Lithium register allocator appears to be a version of linear-scan, with live range splitting. This part of V8 actually is well-commented, so if you can understand the Sarkar paper, then head on over to ye olde google code and check it out.

generate all the code!

The result of register allocation is that all LOperands have been assigned specific machine registers, and some moves have been recorded in the LGap instructions. Now, finally, we generate code: each Lithium instruction is traversed in order, and they all write assembly into a buffer. See the LAddI code generator, as an example.

All of the JavaScript engines generate code using assemblers written in C++. Some people still assume that compilers must generate assembly, in text, but it's much easier and faster just to call methods on an object that writes bytes into a buffer. This approach also makes it easy to add macro instructions, just by subclassing the assembler. Finally, since modern JS implementations have to muck around with the bits all the time (disassembly, source debugging, backtraces, garbage collection (tracing and moving), deoptimization, OSR, inline caches, type feedback), it's easiest to maintain your invariants if you control the bit generation.

V8's assembler originally came from the Strongtalk project, where some of the V8 hackers came from, long ago. This assembler writes code into memory, going forward, and then when it's finished, it appends "relocation information" to the code, written backwards. Relocation information lets V8 mark interesting places in the generated code: pointers that might need to be relocated (after garbage collection, for example), correspondences between the machine program counter and source locations, etc.

As anyone who's hacked on debuggers knows, the problem with debug information is that it is large. Without clever encodings, the amount of debugging data generated by a compiler can kill runtime performance, simply because of the amount of data that needs to be paged into memory. Of course this concern also applies at compile-time, and especially so for JIT compilers. The Strongtalk assembler's main contribution to V8 (I think) is a complicated bit-encoding that tries to compress this data as far as possible. The assembler also provides iterators for traversing this data at runtime.

finish line

And now, dear readers, I believe that we have come full circle on this look at V8. We started low, looking at the assembly generated by V8 for a simple loop. We elaborated this low-level view with a look at on-stack replacement, though without understanding how such information was available. So we jumped back to the top, taking a high-level look at V8's compilers. We drilled down to Hydrogen, and here we covered Lithium, register allocation, and code generation.

I hope it's been an entertaining series to read; it has certainly been enlightening to write. Thanks very much to my employer Igalia for supporting my work on this. Comments and corrections are welcome, as always, and happy hacking.

Syndicated 2011-09-05 11:25:05 from wingolog

the gnu extension language

I'm sitting on a train waiting to leave Paris for Barcelona, returning from the 2011 GNU Hackers Meeting. It was fantastic, clearly the best we have had yet. Thanks a lot to Ludovic Courtès for organizing it, and to IRILL and Sylvestre Ledru for hosting. I hope to write more about it in the future, but this essay will be long enough :)

I gave a talk entitled "The User in the Loop", which made the perhaps obvious argument that "extensibility is good & stuff". I hope that it did so in an entertaining and illuminating fashion. It also argued that Guile is a great fit for the extensibility needs of the GNU project. The video will be out shortly. Slides are available here, though you probably just want the notes instead.

going meta: goals and strategies

Guile is the GNU extension language. This is the case because Richard Stallman said so, 17 years ago. Beyond being a smart guy, Richard is powerfully eloquent: his "let there be Guile" proclamation was sufficient to activate the existing efforts to give GNU a good extension language. These disparate efforts became a piece of software, a community of hackers and users, and an idea in peoples' heads, on their lips, and at their fingertips.

So, Guile is a great language. But is it still the right extension language for GNU? At this GHM, Jim Blandy brought up the good point that we should revisit our past decisions periodically, to see if they still fit with our goals and with the world. "X is the GNU Y" is still a powerful, generative proclamation, so we should be careful with that power, to make sure that we are generating the right world. In that spirit, I would like to re-make the argument for Guile as the GNU extension language.

goal

Why care about extension languages? My presentation deals with this issue at length, but a nice summary can be found in the Guile manual:

Guile was conceived by the GNU Project following the fantastic success of Emacs Lisp as an extension language within Emacs. Just as Emacs Lisp allowed complete and unanticipated applications to be written within the Emacs environment, the idea was that Guile should do the same for other GNU Project applications. This remains true today.

The idea of extensibility is closely related to the GNU project's primary goal, that of promoting software freedom. Software freedom means that people receiving a software package can modify or enhance it to their own desires, including in ways that may not have occurred at all to the software's original developers. For programs written in a compiled language like C, this freedom covers modifying and rebuilding the C code; but if the program also provides an extension language, that is usually a much friendlier and lower-barrier-of-entry way for the user to start making their own changes.

-- Guile and the GNU Project

Although GNU maintainers are free to do as they wish with their packages, we are working on making an integrated system, and a system that mostly uses one extension language is better than one with many implementations. It is reasonable to promote one language as the default choice, while not forbidding others, of course.

I think that Guile is a great strategy to achieve this goal. Guile is good for the programs it extends, it is good for users, and it is good for GNU as a whole. The rest of this article argues these points in more detail.

languages do matter

An extension language is first and foremost a language: a vehicle for expression and for computation. Guile is an implementation of Scheme, one of the most expressive languages out there.

In his 2002 essay, Revenge of the Nerds, Paul Graham argues (somewhat polemically) that some languages are better than others, and that you should use the better languages. Of course he identifies "better" with "closer to Common Lisp", which is somewhat naïve, but the general point still stands.

At the time when Guile was started, it was quite clear that Scheme was indeed more expressive than Bourne shell, or than Tcl. Since then, many languages have come and gone, and most of those that have stuck around do have many of the features of Scheme: garbage collection, closures, lexical scope, etc.

But still, there are many areas in which Guile Scheme remains more powerful than other languages.

macros: language extensibility

First and foremost, Scheme's macros have no comparison in any other language. Macros let users extend their language. Macros allow a language be adapt to time, to the different needs of the future (unevenly distributed as it is).

I realize that this broad argument will not speak to people that do not use macros, so here is a concrete example.

Some languages have pattern-matching facilities, like Erlang. Pattern-matching goes like this: you have a datum, and a set of expected patterns. You go over the patterns in order, seeing if the datum matches the pattern. If it matches, some matching part of the datum is extracted and bound to local variables, and a consequent expression is evaluated.

With most languages, either you have pattern matching, because Joe Armstrong put it there, or you don't, in which case you are either blissfully ignorant or terminally frustrated. But in languages with macros, like Scheme, you can extend your programming language to give it pattern-matching features.

I'll show an implementation here, to keep things concrete. Here's a definition of a match macro. If you are unfamiliar with Scheme macros, do take a look at the relevant section in Guile's manual.

(define-syntax match
  (syntax-rules (else)
    ;; Use `pat' helper macro to see if a clause matches.
    ((match v (p e0 e ...) cs ...)
     (let ((fk (lambda () (match v cs ...))))
       (pat v p (begin e0 e ...) (fk))))
    ;; If we get to an `else' clause, evaluate its body.
    ((match v (else e0 e ...))
     (begin e0 e ...))))

There are two rules in this macro. The first rule hands off the real work to a helper macro, pat. The second just matches an else clause at the end.

(define-syntax pat
  (syntax-rules (_ unquote)
    ;; () succeeds if the datum is null, and fails
    ;;  otherwise.
    ((pat v () kt kf)
     (if (null? v) kt kf))
    ;; _ matches anything and succeeds.
    ((pat v _ kt kf)
     kt)
    ;; ,VAR binds VAR to the datum and succeeds.
    ((pat v (unquote var) kt kf)
     (let ((var v)) kt))
    ;; If the pattern is a pair, succeed if the datum
    ;; is a pair and its car and cdr match.
    ((pat v (x . y) kt kf)
     (if (pair? v)
         (let ((vx (car v)) (vy (cdr v)))
           (pat vx x (pat vy y kt kf) kf))
         kf))
    ;; Literals in a pattern match themselves.
    ((pat v lit kt kf)
     (if (eq? v (quote lit)) kt kf))))

The pat helper macro is written in so-called continuation-passing style. This means that pat receives the consequent expressions to evaluate as arguments: one if the pattern matches (kt), and the other if it doesn't (kf). I have commented the clauses to help readers that are unfamiliar with syntax-rules.

I hope that you find this implementation to be elegant, but that's not really the important thing. The important thing is that if we put this code in a module and export match, then you have provided pattern matching to a language that didn't have it. You, as a user, have extended your programming language.

This facility is quite useful. Let us consider the case of some HTML, which you have parsed to an SXML representation:

(define html '(a (@ (href "http://gnu.org/")) "GNU"))

We can use the match form to get the URL for this link:

(define (link-target x)
  (match x
    ((a (@ (href ,target)) . _) target)
    (else #f)))

(link-target html) => "http://gnu.org"
(link-target '(p "not a link")) => #f

Sometimes people say that macros make code hard to read, because (they say) everyone makes up their own language. I hope that this example demonstrates that this is not the case: link-target above is eminently more readable, robust, and efficient than rolling your own matcher by hand. Good programmers use macros to bring the language up to the level of the problem they are solving.

Note that you didn't need to build consensus in the ECMAScript committees to add match to your language. Nor do you have to be satisfied with Joe Armstrong's particular implementation of matching, as you do in Erlang. Guile's default matcher has a number of bells and whistles, but you are free to write custom matchers for other formats, extending matchers to protocols and binary records of various types.

Finally, this same argument applies to embedded parsers, domain-specific languages, dynamic binding, and a whole host of other things. You don't have to wait for a language like Python to finally add a with statement; you can add it yourself. To paraphrase Alan Perlis, Guile is an extensible extension language.

delimited continuations

Delimited continuations let a user extend their code with novel, expressive control structures that compose well with existing code.

I'm going to give another extended example. I realize that there is a risk of losing you in the details, dear reader, but the things to keep in mind are: (1) most other programming languages don't let you do what I'm about to show you; and (2) this has seriously interesting practical applications.

So, let's take coroutines as the concrete example. Guile doesn't define a standard coroutine abstraction yet, though it probably should. Here is one implementation. We begin by defining call-with-yield:

(define (call-with-yield proc)
  (let ((tag (make-prompt-tag)))
    (define (handler k . args)
      (define (resume . args)
        (call-with-prompt tag
                          (lambda () (apply k args))
                          handler))
      (apply values resume args))

    (call-with-prompt
     tag
     (lambda ()
       (let ((yield (lambda args
                      (apply abort-to-prompt tag args))))
         (proc yield)))
     handler)))

The guts of this function is the call-with-prompt call, which makes a yield procedure, and passes it to proc. If the yield is called, it will jump back up the stack, passing control to handler.

The handler builds a resume function, and returns it along with the values that were yielded. The user can then call the resume function to continue computation, possibly passing it some values.

If we would like to add some syntactic sugar to make this form easier to read, we can do so:

(define-syntax with-yield
  (syntax-rules ()
    ((_ yield exp exp* ...)
     (call-with-yield
      (lambda (yield) exp exp* ...)))))

As an example, here is a function that turns a traversal operator into a generator:

(define (generate traverse collection)
  (with-yield yield
    (traverse yield collection)))

Let's see a use of it.

> (generate for-each '(three blind mice))
$1 = #<procedure resume args>
$2 = three
> ($1)
$3 = #<procedure resume args>
$4 = blind
> ($3)
$5 = #<procedure resume args>
$6 = mice
> ($1)
$7 = #<procedure resume args>
$8 = blind
> ($5)
> 

This is fascinating stuff. Normally when you call for-each, you lose control of your execution. for-each is going to call your function as many times as it wants, and that's that. But here we have inverted control-flow so that we're back in charge again.

Note that we did this without any cooperation from for-each! In Python, if you want to compose a number of its generators into a coroutine, every procedure in the call stack must cooperate. In fact until recently this wasn't even possible to implement efficiently. But in Guile, delimited continuations allow us to remove the unnecessary restrictions that make yield from seem like a good idea. Our with-yield composes with the other parts of our language.

Also note that these generators are functional: instead of mutating an object, causing the next call to it to proceed, we return the resume procedure as a value. It's not always what you want, and indeed you could make different choices, but this does have the advantage that you can rewind your traversal to different points, as I did above by calling $1 after calling $3. You can use this strategy to implement other advanced control-flow operators, like amb.

One can use this approach in other inversion-of-control contexts. Node.js people are all excited about their asynchronous idioms, but they have no idea how much better it could be if JavaScript actually supported asynchronous idioms on a language level.

For more on delimited continuations, see the readable 1993 paper by Dorai Sitaram, Handling Control.

the next 700 language features

Guile has a number of other language features that other extension languages do not. Here's a short list.

goops

GOOPS is Guile's Object-Oriented Programming System. It implements OOP like Common Lisp's CLOS does.

What this means to most programmers is that GOOPS is a little different. In Smalltalk-style OOP, the focus is on classes. In Self-style OOP, the focus is on prototypes. But in CLOS-style OOP, the focus is on generic functions. A good introduction to this topic can be found in Gregor Kiczales' beautiful book, The Art of the Metaobject Protocol. At OOPSLA 1997, Alan Kay called it "the best book written in computing in ten years".

Anyway, what I mean to say here is Guile does support object-oriented programming, and although it is not the same as (for example) Python's model, it is quite capable and expressive. Some of its features, like class redefinition, are not found in any of the more common OO systems.

parallel computation

Guile supports POSIX threads, with no global interpreter lock. Coordination between threads is managed via the normal primitives like mutexes and conditional variables, though there are higher-level forms like with-mutex.

Pthreads provide concurrency. Higher-level constructs provide structured parallelism, like futures, for fine-grained parallelism, and other forms like par-map.

I'm going to go out on a limb here and say that managing concurrency is the major problem of programming today. Guile's existing facilities are OK, but are not as convincing as those of Clojure or D. Is STM the right answer? Or is it shared-nothing by default, like Erlang? I don't know, but I'm pretty sure that we can build the right answer in Guile, whatever the right answer is.

the numeric tower

Guile implements the full numeric tower: integers (of any precision), real numbers, rationals, complex numbers, and the strange IEEE-754 values like negative zero, not-a-number, and the infinities. Arithmetic works with them all. In addition, there are some plans afoot to add support for optional multi-precision floating-point and complex numbers.

I shouldn't have to say this, but in Guile, (/ 12 10) is not 1. Nor is it a double-precision approximation, for that matter; it is the rational number, 6/5.

data types

Guile comes with a wealth of built-in data types: bytevectors, uniform arrays (for example, a vector of packed unsigned 16-bit integers), multidimensional arrays (possibly of packed values), bitvectors, strings consisting of characters (as opposed to bytes), records, etc. There are also the standard Lisp data types like symbols, pairs, vectors, characters, and so on.

You can define your own data types, using GOOPS, using the record facility, or the lower-level structs. Of course you can build ad-hoc data structures using the other predefined data types; after all, as Alan Perlis says, "it is better to have 100 functions operate on one data structure than 10 functions on 10 data structures."

functional programming

Scheme is a multi-paradigm language, supporting many kinds of programming techniques. It has excellent support for the functional style, with its lexical scope, closures, and proper tail calls. Also, unlike their call/cc cousins, Guile's delimited continuations are practical to use in a functional style, as shown above in the generator example.

Finally, I should mention that all of these features are designed to work well together. Macros compose with modules to preserve lexical scoping. Delimited continuations can transfer control through functional combinators while preserving referential transparency. Dynamic bindings behave sanely when continuations are unwound and reinstated. Scheme is a language that "hangs together": its features are carefully chosen to produce sound semantics when used in combination.

implementation features

So much for the Scheme language, as implemented by Guile. On a related but not identical tack, I would like to argue that Guile's implementation is a good practical fit for GNU.

development experience

Guile has a fantastic developer experience. Two factors contribute to this. One is that it ships with a full-featured, pleasant REPL, with tab-completion, readline, and a debugger. The second factor is Guile's reflective capabilities, coupled with the excellent Geiser Emacs mode.

As Geiser hacker Jao says, when it comes to debugging, you can be a mortician or you can be a physician. Core dumps are corpses. The REPL is the land of the living. If you run Guile with the --listen argument, it will spawn off a REPL thread, listening on a port. Geiser can telnet into that port to work with your program as it is running.

See Using Guile in Emacs, for more on how to hook this up.

multiple languages

It's pretty clear to me that we're going to nail the Elisp / Scheme integration. BT Templeton has been doing some great work on that this summer, and we are closer than ever. Admittedly some folks have doubts about how we will pull this off, but I am convinced. I will have to defer detailed arguments to some other forum, though.

Now, when Richard first promoted Guile as the GNU extension language, I think it was because he loved Lisp, but he did consider that not every user would share this perspective. For that reason he stressed the multi-lingual characteristics of Guile, which until now have not really come to fruition. However it does seem that in addition to Scheme and Emacs, with time we can probably get Lua and JavaScript working well, to the point that users will feel happy making small- and medium-sized programs and extensions in these languages.

It's true that if we decided that our best strategy would be to make JavaScript hackers happy, then we should adopt V8 instead, or something like that. But I don't think that is our best strategy. I will come back to this point in the conclusion.

health

Guile is some 16 years old now, and time has winnowed its interface to something that's mostly clean and orthogonal. There are still some crufty pieces, but we have a good deprecation story, both for Scheme code and for C code linking to libguile.

Despite its maturity, Guile is a very healthy project. Check the Ohloh page for details, but note that the activity statistics are based on the master branch, whereas most commits have been landing on stable-2.0 since 2.0.0 was released in February. We'll move back into a development phase at some point in the next few months.

And on the level of users, things seem to be going well -- it's not the most-used language out there, but it is gaining traction and in respectability. The number of contributors is going up, and the various forums are active.

choosing a strategy to achieve our goals

I have argued, I hope convincingly, that Guile is a good choice for an extension language. Now I would like to put the decision itself in a context. As Noam Chomsky keeps reminding us, we need to judge decisions by their expected consequences, not their proclaimed intentions. Does Guile generate the right future for GNU?

Let me begin by arguing this point from the contrapositive: that choosing another language is not the right thing for GNU. Let us take the example of JavaScript, a very popular language these days. Jim Blandy notes that the real measure of an extension language is that, when you want to implement a feature, that you want to implement it in the extension language rather than the original language (C, usually). So let us assume that JavaScript triumphs in this regard in the future, that the GNU system ends up mostly implemented in JavaScript, perhaps via the V8 implementation. Is this the right future to generate?

For me, the answer is no. First of all, what does it mean to have a GNU system based on a non-GNU technology? What would be left of GNU? It might be lack of imagination on my part, but such a future makes me think of country folk come to the city and losing sight of who they are. GNU is a social organization that makes software, with particular social characteristics. Mixing GNU with a foreign language community sounds dangerous to GNU, to me. See Kent Pitman's Lambda, the Ultimate Political Party for similar musings.

Beyond the cultural issues -- which to me are the ones that really matter -- there are some important technical points to consider. Right now GNU is mostly written in C, and controls its implementation of C, and the libraries that it depends on. Would GNU be able to exercise such control on JavaScript? Can GNU build in new concurrency primitives? Could we add new numeric types? Probably not. JavaScript is a language with a platform, and that platform is the browser. Its needs are not very related to the rest of GNU software. And if we did adapt JavaScript to POSIX-like platforms, GNU would lose some of the advantages of choosing a popular language, by making a second-class, foreign platform.

We should also consider the costs of using hastily designed languages. JavaScript has some crazy bad stuff, like with, var hoisting, a poor numeric model, dynamic this scoping, lack of modularity regarding binding lookup (witness the recent spate of browser bugs regarding prototype lookup on user objects), the strange arguments object, and the lack of tail recursion. Then there are the useful features that JavaScript lacks, like macros, modules, and delimited continuations. Using JavaScript has a cost, and its total cost in GNU would have to be integrated over its lifespan, probably 20 years or so. These limitations might be acceptable now, but not in the context of the larger programs of the future. (This, by the way, was the essence of Richard's argument against Tcl.)

Finally, we have the lifespan issue. If GNU had chosen Tcl because it was popular, we would have a mass of dead code. (You can like Tcl and still admit that we are past its prime.) Python is now, I think, at the height of its adoption curve, and beginning its descent. JavaScript is the next thing, and still on the uptick. But JavaScript will fall out of favor, and there will be a new paradigm, and a next language. The future of computing will not be the same as the present. So how will JavaScript adapt to these changes? We can't tell right now, but given the difficulty in changing simple things like making 010 parse as 10 and not 8 indicates that at some point it will stagnate. But Scheme will still be with us, because its parts are well thought-out, and because it is a language that is inherently more adaptable.

Paul Graham concludes Revenge of the Nerds with the following advice:

"If you start a startup, don't design your product to please VCs or potential acquirers. Design your product to please the users. If you win the users, everything else will follow. And if you don't, no one will care how comfortingly orthodox your technology choices were."

In the case of a GNU extension language, our users are developers, so we do need to work hard to please them. But free software has a different life cycle than a VC-backed startup; instead of racing to the flip, we have to please our users now and in ten years. Viewed from the long now, the future is not simply a first-order extrapolation of the present: it is something that we generate. Guile is still the right choice for the GNU extension language because it generates the right future: a healthy, sustainable, powerful GNU project that adapts to the changing needs of time.

Syndicated 2011-08-30 18:15:29 from wingolog

373 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!