Older blog entries for wingo (starting at number 418)

elf in guile

Good evening, gentle hackfolk!

Today I'd like to wrap up my three-part series of articles on what's new in Guile 2.2's compiler and runtime. I talked about the virtual machine a couple months ago, and the compiler internals just last Sunday. Today's article is about the object file format.

Sounds boring, right? Well, probably for most humans. But hackers, Guile compiles to ELF. Pretty rad, amirite? I thought so too. Read on, nerdy readers, read on!

object files

So let's consider the problem: Guile compiles to bytecode for a custom virtual machine. In the future we want to do native compilation. In both cases we'll need to write bytes out to disk in some format, and be able to load that code back into Guile. What should be the format for those bytes? For our purposes, a good object file format has a number of characteristics:

  • Above all else, it should be very cheap to load a compiled file.

  • It should be possible to statically allocate constants in the file. For example, a bytevector literal in source code can be emitted directly into the object file.

  • The compiled file should enable maximum code and data sharing between different processes.

  • The compiled file should contain debugging information, such as line numbers, but that information should be separated from the code itself. It should be possible to strip debugging information if space is tight.

These characteristics are not specific to Scheme. So why not just steal from the C and C++ people? They use the flexible ELF object file format, and so does Guile.

Note that although Guile uses ELF on all platforms, we do not use platform support for ELF. Guile implements its own linker and loader. The advantage of using ELF is not sharing code, but sharing ideas. ELF is simply a well-designed object file format.

An ELF file has two meta-tables describing its contents. The first meta-table is for the loader, and is called the program table or sometimes the segment table. The program table divides the file into big chunks that should be treated differently by the loader. Mostly the difference between these segments is their permissions.

Typically all segments of an ELF file are marked as read-only, except that part that represents modifiable static data or static data that needs load-time initialization. Loading an ELF file is as simple as mmapping the thing into memory with read-only permissions, then using the segment table to mark a small sub-region of the file as writable. This writable section is typically added to the root set of the garbage collector as well.

The other meta-table in an ELF file is the section table. Whereas the program table divides an ELF file into big chunks for the loader, the section table specifies small sections for use by introspective tools like debuggers or the like. One segment (program table entry) typically contains many sections. There may be sections outside of any segment, as well.

I know, this is a bit dry. So I made you a picture; or rather, a program that makes pictures. Here's an example of one of the pictures:

What we see here is a page map of assembler.go, a module in Guile's compiler. The upper part of the top shows each section in a different color, in the place they appear in the file. Each horizontal line is one four-kilobyte page.

As you can see, some pages contain multiple sections or parts of sections; that is the case for pages that have the same permissions (read-only versus read-write), and which don't have special alignment requirements. About halfway down after the small red .dynamic section there is a gap, indicating that the next section, .data needs to start on a separate page -- in this case because it is writable.

This page map shows the sections of the ELF file. (There are only three segments; the first read-only part, a small "dynamic" segment holding the .dynamic section, used when the file is loaded; and the final read-write section. You can't see this from the visualization, but actually everything after .data is in no segment at all -- because it's not strictly necessary at run-time. That's the ELF convention.)

(Why is this file so big, you ask? It's a complicated answer. Part of it is because much of the assembler is itself a generated program; it uses Scheme procedural macros to define emit-foo procedures for each kind of instruction, based information extracted from the VM code (link). Guile doesn't do phasing, so these macros are residualized into the object file, and because of datum->syntax, macros have a lot of associated constant literals. That probably explains the huge size of .data, as syntax objects contain vectors and lists and symbols needing run-time relocation. I would have to check, but I think .rtl-text is big mostly because of a number of dynamic type checks in this particular module that the optimizer is (rightly) unable to elide. Alack. Surely both of these can be fixed eventually.)

But enough of problems. Did I mention that I made a program? And how! I think this may be the only blog post you have ever read that has a form in it, but that's how I am rolling today. Select an ELF file on your system, click submit, and you can have your very own page map. It works on any ELF file, not just Guile's files, so you can send your libc.so or whatever.

If you don't see a form above, your blog reader must have stripped it out. Click through to the blog post itself, or go visit the tool directly. And of course the program is written in Guile -- the file upload handling, the ELF parsing, and the pixel munging (using Cairo and a homebrew charting library).

Anyway, here's another example, this time of a small file:

Here we see that this file has the same sections as our earlier bloated example, only they are smaller.

Allow me to point out some details.

Firstly, the .rtl-text section holds the bytecode. Usually this would be called .text, but I didn't want to mess with people's expectations, and the new VM used to be called the "RTL" VM. (No longer, that was a silly name.)

The .data section holds data that needs initialization, or which may be modified at runtime. .rodata holds statically allocated data that needs no run-time initialization, and which therefore can be shared between processes. The initializations themselves, like relocations, are compiled to a procedure in the .rtl-text section, linked to from the dynamic section.

And that's all the sections! Except for the introspection and debugging-related sections, that is. These sections are used by the Guile runtime to be able to answer questions like "what is the name of this function?" or "what is the source position corresponding to bytecode position X?" or "what is the documentation string for this function?" or, well, I think you get the point. None of this is usually needed at runtime, and because it is all allocated at the end of the file, that means that usually none of it is ever paged into memory.

Note that for some of this metadata we use the standard DWARF format. I think we are one of the few dynamic language runtimes that does this nifty thing.

Also, all read-only data in these ELF files is ripe for sharing between processes, paging out to disk if under memory pressure, etc. For example, if I look on the smaps file for one of the two web processes running on this server, I see:

/opt/guile-2.2/lib/guile/2.2/ccache/web/server/http.go
Size:                132 kB
Rss:                 124 kB
Pss:                  62 kB

meaning that for this particular file, almost all of it is not only shareable but being shared. Good times.

Finally, all of this has a positive impact on start-up time. While I can get Guile 2.0 to start up in 11 milliseconds, with Guile 2.2 I am down to 8 milliseconds. Likewise guile -c '(sleep 100)' in Guile 2.0 uses 3144 kB of private dirty memory, compared to 1852 kB with Guile 2.2. There's still improvements to be made, but things are going well.

Well, again I find myself rambling. Check out the little ELF mapper tool I have above and let me know of any curious results. Do send your questions as well; though I've been derelict at responding, who knows, new year, new leaf right? Until next time, happy hacking.

Syndicated 2014-01-19 19:55:14 from wingolog

a continuation-passing style intermediate language for guile

Happy new year's, hackfolk!

A few weeks ago I wrote about the upcoming Guile 2.2 release, and specifically about its new register virtual machine. Today I'd like to burn some electrons on another new part in Guile 2.2, its intermediate language.

To recap, we switched from a stack machine to a register machine because, among other reasons, register machines can consume and produce named intermediate results in fewer instructions than stack machines, and that makes things faster.

To take full advantage of this new capability, it is appropriate to switch at the same time from the direct-style intermediate language (IL) that we had to an IL that names all intermediate values. This lets us effectively reason about each subexpression that goes into a computation, for example in common subexpression elimination.

As far as intermediate languages go, basically there are two choices these days: something SSA-based, or something CPS-based. I wrote an article on SSA, ANF and CPS a few years ago; if you aren't familiar with these or are feeling a little rusty, I suggest you go and take a look.

In Guile I chose a continuation-passing style language. I still don't know if I made the right choice. I'll go ahead and describe Guile's IL and then follow up with some reflections. The description below is abbreviated from a more complete version in Guile's manual.

guile's cps language

Guile's CPS language is composed of terms, expressions, and continuations. It was heavily inspired by Andrew Kennedy's Compiling with Continuations, Continued paper.

A term can either evaluate an expression and pass the resulting values to some continuation, or it can declare local continuations and contain a sub-term in the scope of those continuations.

$continue k src exp
Evaluate the expression exp and pass the resulting values (if any) to the continuation labelled k.
$letk conts body
Bind conts in the scope of the sub-term body. The continuations are mutually recursive.

Additionally, the early stages of CPS allow for a set of mutually recursive functions to be declared as a term via a $letrec term. A later pass will attempt to transform the functions declared in a $letrec into local continuations. Any remaining functions are later lowered to $fun expressions. More on "contification" later.

Here is an inventory of the kinds of expressions in Guile's CPS language. Recall that all expressions are wrapped in a $continue term which specifies their continuation.

$void
Continue with the unspecified value.
$const val
Continue with the constant value val.
$prim name
Continue with the procedure that implements the primitive operation named by name.
$fun src meta free body
Continue with a procedure. body is the $kentry $cont of the procedure entry. free is a list of free variables accessed by the procedure. Early CPS uses an empty list for free; only after closure conversion is it correctly populated.
$call proc args
Call proc with the arguments args, and pass all values to the continuation. proc and the elements of the args list should all be variable names. The continuation identified by the term's k should be a $kreceive or a $ktail instance.
$primcall name args
Perform the primitive operation identified by name, a well-known symbol, passing it the arguments args, and pass all resulting values to the continuation.
$values args
Pass the values named by the list args to the continuation.
$prompt escape? tag handler
Push a prompt on the stack identified by the variable name tag and continue with zero values. If the body aborts to this prompt, control will proceed at the continuation labelled handler, which should be a $kreceive continuation. Prompts are later popped by pop-prompt primcalls.

The remaining element of the CPS language in Guile is the continuation. In CPS, all continuations have unique labels. Since this aspect is common to all continuation types, all continuations are contained in a $cont instance:

$cont k cont
Declare a continuation labelled k. All references to the continuation will use this label.

The most common kind of continuation binds some number of values, and then evaluates a sub-term. $kargs is this kind of simple lambda.

$kargs names syms body
Bind the incoming values to the variables syms, with original names names, and then evaluate the sub-term body.

Variables (e.g., the syms of a $kargs) should be globally unique. To bind the result of an expression a variable and then use that variable, you would continue from the expression to a $kargs that declares one variable. The bound value would then be available for use within the body of the $kargs.

$kif kt kf
Receive one value. If it is a true value, branch to the continuation labelled kt, passing no values; otherwise, branch to kf.

Non-tail function calls should continue to a $kreceive continuation in order to adapt the returned values to their uses in the calling function, if any.

$kreceive arity k
Receive values from a function return. Parse them according to arity, and then proceed with the parsed values to the $kargs continuation labelled k.

$arity is a helper data structure used by $kreceive and also by $kclause, described below.

$arity req opt rest kw allow-other-keys?
A data type declaring an arity. See Guile's manual for details.

Additionally, there are three specific kinds of continuations that can only be declared at function entries.

$kentry self tail clauses
Declare a function entry. self is a variable bound to the procedure being called, and which may be used for self-references. tail declares the $cont wrapping the $ktail for this function, corresponding to the function's tail continuation. clauses is a list of $kclause $cont instances.
$ktail
A tail continuation.
$kclause arity cont
A clause of a function with a given arity. Applications of a function with a compatible set of actual arguments will continue to cont, a $kargs $cont instance representing the clause body.

reflections

Before starting Guile's compiler rewrite, I had no real-world experience with CPS-based systems. I had worked with a few SSA-based systems, and a few more direct-style systems. I had most experience with the previous direct-style system that Guile had, but never had to seriously design another kind of IL, so basically I was ignorant. It shows, I think; but time will tell if it came out OK anyway. At this point I am cautiously optimistic.

As far as fitness for purpose goes, the CPS IL works in the sense that it is part of a self-hosting compiler. I'll say no more on that point other than to mention that it has explicit support for a number of Guile semantic features: multiple-value returns; optional, rest, and keyword arguments; cheap delimited continuations; Guile-native constant literals.

Why not ANF instead? If you recall from my SSA and CPS article, I mentioned that ANF is basically CPS with fewer labels. It tries to eliminate "administrative" continuations, whereas Guile's CPS labels everything. There is no short-hand let form.

ANF proponents tout its parsimony as a strength, but I do not understand this argument. I like having labels for everything. In CPS, I have as many labels as there are expressions, plus a few for continuations that don't contain terms. I use them directly in the bytecode compiler; the compiler never has to generate a fresh label, as they are part of the CPS itself.

More importantly, labelling every control-flow point allows me to reason precisely about control flow. For example, if a function is always called with the same continuation, it can be incorporated in the flow graph of the calling function. This is called "contification". It is not the same thing as inlining, as it works for sets of recursive functions as well, and never increases code size. This is a crucial transformation for a Scheme compiler to make, as it turns function calls into gotos, and self-function calls into loop back-edges.

Guile's previous compiler did a weak form of contification, but because we didn't have names for all control points it was gnarly and I was afraid to make it any better. Now its contifier is optimal. See Fluet and Weeks' Contification using Dominators and Kennedy's CWCC, for more on contification.

One more point in favor of labelling all continuations. Many tranformations can be best cast as a two-phase process, in which you first compute a set of transformations to perform, and then you apply them. Dead-code elimination works this way; first you find the least fixed-point of live expressions, and then you residualize only those expressions. Without names, how are you going to refer to an expression in the first phase? It's nasty and much cleaner with the ubiquitous, through labelling that CPS provides.

So I am happy with CPS, relative to ANF. But what about SSA? In my previous article, I asked SSA proponents to imagine returning a block from a block. Of course it doesn't make any sense; SSA is a first-order language. But modern CPS is also first-order, is the thing! Modern CPS distinguishes "continuations" syntactically from functions, which is exactly the same as SSA's distinction between basic blocks and functions. CPS and SSA really are the same on this level.

The fundamental CPS versus SSA difference is, as Stephen Weeks noted a decade ago, one of data structures: do you group your expressions into basic blocks stored in a vector (SSA), or do you nest them into a scope tree (CPS)? It's not clear that I made the correct choice.

In practice with Guile's CPS you end up building graphs on the side that describe some aspect of your term. For example you can build a reverse-post-ordered control flow analysis that linearizes your continuations, and assigns them numbers. Then you can compute a bitvector for each continuation representing each one's reachable continuations. Then you can use this reachability analysis to determine the extent of a prompt's body, for example.

But this analysis is all on the side and not really facilitated by the structure of CPS itself; the CPS facilities that it uses are the globally unique continuation and value names of the CPS, and the control-flow links. Once the transformation is made, all of the analysis is thrown away.

Although this seems wasteful, the SSA approach of values having "implicit" names by their positions in a vector (instead of explicit ephemeral name-to-index mappings) is terrifying to me. Any graph transformation could renumber things, or leave holes, or cause vectors to expand... dunno. Perhaps I am too shy of the mutation foot-gun. I find comfort in CPS's minimalism.

One surprise I have found is that I haven't needed to do any dominator-based analysis in any of the paltry CPS optimizations I have made so far. I expect to do so once I start optimizing loops; here we see the cultural difference with SSA I guess, loops being the dominant object of study there. On the other hand I have had to solve flow equations on a few occasions, which was somewhat surprising, though enjoyable.

The optimizations I have currently implemented for CPS are fairly basic. Contification was tricky. One thing I did recently was to make all non-tail $call nodes require $kreceive continuations; if, as in the common case, extra values were unused, that was reflected in an unused rest argument. This required a number of optimizations to clean up and remove the extra rest arguments for other kinds of source expressions: dead-code elimination, the typical beta/eta reduction, and some code generation changes. It was worth it though, and now with the optimization passes things are faster than they were before.

Well, I find that I am rambling now. I know this is a lot of detail, but I hope that it helps some future compiler hacker understand more about intermediate language tradeoffs. I have been happy with CPS, but I'll report back if anything changes :) And if you are actually hacking on Guile, check the in-progress manual for all the specifics.

Happy hacking to all, and to all a good hack!

Syndicated 2014-01-12 21:58:00 from wingolog

optimizing let in spidermonkey

Peoples! Firefox now optimizes let-bound variables!

What does this mean, you ask? Well, as you nerdy wingolog readers probably know, the new ECMAScript 6 standard is coming soon. ES6 has new facilities that make JavaScript more expressive. At the same time, though, many ES6 features stress parts of JavaScript engines that are currently ignored.

One of these areas is lexical scope. Until now, with two exceptions, all local variables in JavaScript have been "hoisted" to the function level. (The two exceptions are the exception in a catch clause, and the name of a named function expression.) Therefore JS engines have rightly focused on compiling methods at a time, falling back to slower execution strategies when lexical scopes are present.

The presence of let in ES6 changes this. It was a hip saying a couple years ago that "let is the new var", but in reality no one uses let -- not only because engines didn't yet implement let without feature flags, if they implemented it at all, but because let was slow. Because let-bound variables were on the fallback path, they were unoptimized.

But that is changing. Firefox now optimizes many kinds of let-bound variables, making "let is the new var" much closer to being an actual thing. V8 has made some excellent steps in similar directions as well. I'll focus on the Firefox bit here as that's what I've been working on, then go on to mention the work that the excellent V8 hackers have done.

implementing scope in javascript

JavaScript the language is defined in terms of a "scope chain", with respect to which local variables are looked up. Each link in the chain binds some set of variables. Lookup of a name traverses the chain from tip to tail looking for a binding.

Usually it's possible to know at compile-time where a binding will be in the chain. For example, you can know that the "x" reference in the returned function here:

function () {
  var x = 3;
  return function () { return x; };
}

will be one link up in the scope chain, and will be stored in the first named slot in that link. This is a classic implementation of lexical scope, and it plays very well with the semantics of JavaScript as specified. All engines do something like this.

Another thing to note in this case is that we don't need to store the name for "x" anywhere, as we can see through all of its uses at compile-time. In practice however, as JavaScript functions are typically compiled lazily on first call, we do store the association "x -> slot 0" in the outer function's environment.

(Could you just propagate the constant, you ask? Of course you could. But since JS compilation typically occurs one function at a time, lazily, no engine does this initially. Of course later when the optimizing compiler kicks in, it does get propagated, but that usually doesn't avoid the scope chain node allocation.)

Let's take another case.

function foo() { var x = 3; return x; }

In this case we know that the "x" reference can be found in the top link of the chain, in the first slot. Semantically speaking, that is. No JS engine implements things this way. Instead we take advantage of noticing that "x" cannot be captured by any other function. Therefore we are free to assign "x" to a slot in a stack frame, and refer to it by index directly -- without allocating a scope chain node on the heap. All engines do this.

And of course later if this function is hot, the constant exists only in a register, probably inlined to its caller. For this reason scope hasn't had a lot of work put into it. The unit of optimization -- the function -- is the same as the unit of scope. If the implementation details of scope get costly, we take advantage of run-time optimization to speculatively remove allocations, loads, and stores.

OK. What if you have some variables captured, and some not captured?

function foo() { var x = 3; var y = 4; return function () { return x; } }

Here "x" is captured, but "y" is not. Usually engines will allocate "x" on the scope chain, and "y" as a local. (JavaScriptCore is an interesting exception; it uses a strategy I haven't seen elsewhere called "lazy tear-off". Basically all variables are on the stack for the dynamic extent of the scope, whether they are captured or not. If needed, potentially lazily, an object is pushed on the scope chain that aliases the stack slots. If a scope is pushed on the scope chain, when the scope ends the current values are "torn off" the stack and copied to the heap, and the slots pointer of the scope object updated to point to the heap.)

I digress. "x" on the scope chain, "y" not. (I regress: why allocate "y" at all, isn't it dead? Yes it is dead. Optimizing compilers kill it. Baseline compilers don't bother.) So typically access to "x" goes through the scope chain, and access to "y" does not.

Calculating the set of variables that can be captured is one of the few static analyses done by JS baseline compilers. The other ones are whether we are in strict mode or not, whether nested scopes contain "with" or "eval" (thus forcing all locals to be on the scope chain), and whether the "arguments" object is used or not. There may be more, but that is the lowest common denominator of efficiency.

implementing let

let is part of ES6, and all browsers will eventually implement it.

Let me give you an insight as to the mindset of a browser maker. What a browser maker does when they see a feature is first to try to ignore it -- all features have a cost and not all of them pay for themselves. Next you try to do the minimum correct thing -- the thing that passes all the test suites, but imposes a minimal burden on the rest of the system. Usually this means that the feature probably works, except for the corner cases which users will file bugs for, but it is slow. Finally when there is either an internal use the browser maker cares about (Google, Mozilla to an extent) or you want to heat up a benchmark war (everyone else), you start to optimize.

The state of let was until recently between "ignore" and "slow". "Slow" means different things to different browsers.

The engines that have started to implement let are V8 (Chrome) and SpiderMonkey (Firefox). In Chrome, until recently using let in a function was an easy way of preventing that function from ever being optimized by Crankshaft. Good times? When writing this article I was going to claim this was still the case, but I see that in fact this is happily not true. You can Crankshaft a function that has let bindings. I believe, with a cursory glance, that locals that are not captured are still allocated on a scope chain -- but perhaps I am misreading things.

Firefox, on the other hand, would not Ion-compile any function with a let. You would think that this would not be the case, given that Firefox has had let for many years. If you thought this, you misunderstand how browser development works. Basically let me lay it out for you:

class struggle : marxism :: benchmarks : browsers

Get it? Benchmarks are the motor of browser history. If this bloviation has any effect, please let it be to inspire you to go and make benchmarks!

Anyway, Firefox. No Ion for let. The reason why you would avoid optimizing these things is clear -- breaking the "optimization unit == scope unit" equivalence has a cost. However the time has come.

What I have done is to recognize let scopes that have no captured locals. This class of let scope will now be optimized by Ion. So for example in this case:

function sumto(n) {
  let sum = 0;
  for (let i=0; i<n; i++)
    sum += i;
  return sum;
}

Previously this would allocate a block scope around the "for". (Incidentally, Firefox borks block scoping in this case; each iteration should produce a fresh binding. But this is historical, will be fixed, and I digress.) The operations that created and pushed block scopes were not optimized. Avoiding pushing and popping scopes from the scope chain avoids this limitation, and thus we have awesome speed!

The effect of simply turning on Ion cannot be denied. In this case, the runtime of a sum up to 1e9 goes from 8.8 seconds (!) to 0.8 seconds. Obviously it's a little micro-benchmark but that's how I'm rolling today. The biggest optimization is to stop deoptimization.

At the same time, we are still able to reify a parallel chain of "debug scopes" that correspond to the scope chain as the specification would see it. This was the most challenging part of optimizing block scope in Firefox -- not the actual optimization, which was trivial, but optimization while retaining introspection and debuggability.

future work

Unhappily, scopes with let-bound variables that are captured by nested scopes ("aliased", in spidermonkey parlance) are not yet optimized. So not only do they cause scope chain allocation, but they also don't benefit from Ion. Boo. Bug 942810.

I should also mention that the let supported by Firefox is not the let specified in ES6. In ES6 there is thing called a "temporal dead zone" whereby it is invalid to access the value of a let before its initialization. This is like Scheme's "letrec restriction", and has to be enforced dynamically for similar reasons. V8 does a great job in actually implementing this, and Firefox should do it soon.

Of course, it's not clear to me how let can actually be deployed without breaking the tubes. I think it probably can somehow but I haven't checked the latest TC39 cogitations in that regard.

twisted paths

It's been a super-strange end-of-year. I was sure I would be optimizing SpiderMonkey generators and moving on to other things, but I just got caught up with stuff -- the for-of bits took approximately forever and then the amount of state carried in SpiderMonkey stack frames filled me with despair. So it was that this hack, while assisting me in that goal, was not actually a planned thing.

See, SpiderMonkey used to actually reserve a stack slot for the "block chain". No, they aren't using your browser to mine for Bitcoins, though that would be a neat hack. The "block chain" was a simulation of the spec-mandated scope chain. But of course this might not reflect the actual implemented, optimized behavior -- but again one might want to map them to each other for debugging purposes. It was a mess.

The resulting changeset to fix this ended up so large that it took weeks to land. Well, live and learn, right? I remember Michael Starzinger telling me the same thing about V8 -- you just have to keep your patches small, as small as possible, and always working. Words to the wise indeed.

happy days

But in the end we at least have some juice from this citric fruit. This has been an Igalia joint. Thanks very much to Mozilla's Luke Wagner for suffering through the reviews.

Thanks especially to Bloomberg for making this work possible; you folks are swell. Forward ES6!

Syndicated 2013-12-18 20:00:23 from wingolog

a register vm for guile

Greetings, hacker comrades! Tonight's epistle is gnarly nargery of the best kind. See, we just landed a new virtual machine, compiler, linker, loader, assembler, and debugging infrastructure in Guile, and stories like that don't tell themselves. Oh no. I am a firm believer in Steve Yegge's Big Blog Theory. There are nitties and gritties and they need explication.

a brief brief history

As most of you know, Guile is an implementation of Scheme. It started about 20 years ago as a fork of SCM.

I think this lines-of-code graph pretty much sums up the history:

That's from the Ohloh, in case you were wondering. Anyway the story is that in the beginning it was all C, pretty much: Aubrey Jaffer's SCM, just packaged as a library. And it was C people making it, obviously. But Scheme is a beguiling language, and over time Guile has had a way of turning C hackers into Scheme hackers.

I like to think of this graph as showing my ignorance. I started using Guile about 10 years ago, and hacking on it in 2008 or so. In the beginning I was totally convinced by the "C for speed, Scheme for flexibility" thing -- to the extent that I was willing to write off Scheme as inevitably slow. But that's silly of course, and one needs no more proof than the great performance JavaScript implementations have these days.

In 2009, we merged in a bytecode VM and a compiler written in Scheme itself. All that is pretty nifty stuff. We released that version of Guile as 2.0 in 2011, and that's been good times. But it's time to move onward and upward!

A couple of years ago I wrote an article on JavaScriptCore, and in it I spoke longingly of register machines. I think that's probably when I started to make sketches towards Guile 2.2, after having spent time with JavaScriptCore's bytecode compiler and interpreter.

Well, it took a couple of years, but Guile 2.2 is finally a thing. No, we haven't even made any prereleases yet, but the important bits have landed in master. This is the first article about it.

trashing your code

Before I start trashing Guile 2.0, I think it's important to say what it does well. It has a great inlining pass -- better than any mainstream language, I think. Its startup time is pretty good -- around 13 milliseconds on my machine. Its runs faster than other "scripting language" implementations like Python (CPython) or Ruby (MRI). The debugging experience is delightful. You get native POSIX threads. Plus you get all the features of a proper Scheme, like macros and delimited continuations and all of that!

But the Guile 2.0 VM is a stack machine. That means that its instructions usually take their values from the stack, and produce values (if appropriate) by pushing values onto the stack.

The problem with stack machines is that they penalize named values. If I realize that a computation is happening twice and I factor it out to a variable, that means in practice that I allocate a stack frame slot to the value. So far so good. However, to use the value, I have to emit an instruction to fetch the value for use by some other instruction; and to store it, I likewise have to have another instruction to do that.

For example, in Guile 2.0, check out the bytecode produced for this little function:

scheme@(guile-user)> ,disassemble (lambda (x y)
                                    (let ((z (+ x y)))
                                      (* z z)))

   0    (assert-nargs-ee/locals 10)     ;; 2 args, 1 local
   2    (local-ref 0)                   ;; `x'
   4    (local-ref 1)                   ;; `y'
   6    (add)
   7    (local-set 2)                   ;; `z'
   9    (local-ref 2)                   ;; `z'
  11    (local-ref 2)                   ;; `z'
  13    (mul)
  14    (return)

This is silly. There are seven instructions in the body of this procedure, not counting the prologue and epilogue, and only two of them are needed. The cost of interpreting a bytecode is largely dispatch cost, which is linear in the number of instructions executed, and we see here that we could be some 7/2 = 3.5 times as fast if we could somehow make the operations reference their operands by slot directly.

register vm to the rescue

The solution to this problem is to use a "register machine". I use scare quotes because in fact this is a virtual machine, so unlike a CPU, the number of "registers" is unlimited, and in fact they are just stack slots accessed by index.

So in Guile 2.2, our silly procedure produces the following code:

scheme@(guile-user)> ,disassemble (lambda (x y)
                                    (let ((z (+ x y)))
                                      (* z z)))

   0    (assert-nargs-ee/locals 3 1)    ;; 2 args, 1 local
   1    (add 3 1 2)
   2    (mul 3 3 3)
   3    (return 3)

This is optimal! There are four things that need to happen, and there are four opcodes that do them. Receiving operands and sending values is essentially free -- they are indexed accesses off of a pointer stored in a hardware register, into memory that is in cache.

This is a silly little example, but especially in loops, Guile 2.2 stomps Guile 2.0. A simple count-up-to-a-billion test runs in 9 seconds on Guile 2.2, compared to 24 seconds in Guile 2.0. Let's make a silly graph!

Of course if we compare to V8 for example we find that V8 does a loop-to-a-billion in about 1 second, or 9 times faster. There is some way to go. There are a couple of ways that I could generate better bytecode for this loop, for another 30% speed boost or so, but ultimately we will have to do native compilation. And we will! But that is another post.

gritties

Here's the VM. It's hairy in the prelude, and the whole thing is #included twice in another C file (for a debugging and a non-debugging mode; terrible), but I think it's OK for being in C. (If it were in C++ it could be nicer in various ways.)

The calling convention for this VM is that when a function is called, it receives its arguments on the stack. The stack frame looks like this:

   /------------------\
   | Local N-1        | <- sp
   | ...              |
   | Local 1          |
   | Local 0          | <- fp
   +==================+
   | Return address   |
   | Dynamic link     |
   +==================+
   :                  :

Local 0 holds the procedure being called. Free variables, if any, are stored inline with the (flat) closure. You know how many arguments you get by the difference between the stack pointer (SP) and the frame pointer (FP). There are a number of opcodes to bind optional arguments, keyword arguments, rest arguments, and to skip to other case-lambda clauses.

After deciding that a given clause applies to the actual arguments, a prelude opcode will reset the SP to have enough space to hold all locals. In this way the SP is only manipulated in function prologues and epilogues, and around calls.

Guile's stack is expandable: it is originally only a page or two, and it expands (via mremap if possible) by a factor of two on every overflow, up to a configurable maximum. At expansion you have to rewrite the saved FP chain, but nothing else points in, so it is safe to move the stack.

To call a procedure, you put it and its arguments in contiguous slots, with no live values below them, and two empty slots for the saved instruction pointer (IP) and FP. Getting this right requires some compiler sophistication. Then you reset your SP to hold just the arguments. Then you branch to the procedure's entry, potentially bailing out to a helper if it's not a VM procedure.

To return values, a procedure shuffles the return values down to start from slot 1, resets the stack pointer to point to the last return value, and then restores the saved FP and IP. The calling function knows how many values are returned by looking at the SP. There are convenience instructions for returning and receiving a single value. Multiple values can be returned on the stack easily and efficiently.

Each operation in Guile's VM consists of a number of 32-bit words. The lower 8 bits in the first word indicate the opcode. The width and layout of the operands depends on the word. For example, MOV takes two 12-bit operands. Of course, 4096 locals may not be enough. For that reason there is also LONG-MOV which has two words, and takes two 24-bit operands. In LONG-MOV there are 8 bits of wasted space, but I decided to limit the local frame address space to 24 bits.

In general, most operations cannot address the full 24-bit space. For example, there is ADD, which takes two 8-bit operands and one 8-bit destination. The plan is to have the compiler emit some shuffles in this case, but I haven't hit it yet, and it was too tricky to try to get right in the bootstrapping phase.

JavaScriptCore avoids the address space problem by having all operands be one full pointer wide. This wastes a lot of memory, but they lazily compile and can throw away bytecode and reparse from source as needed, neither of which are true for Guile. We aim to do a good ahead-of-time compilation, to enable self-hosting of the compiler.

JSC's pointer-wide operands do provide the benefit of allowing the "opcode" word to actually hold the address of the label, instead of an index to a table of addresses. This is a great trick, but again it's not applicable to Guile as we don't want to relocate bytecode that we load from disk.

Relative jumps in Guile's VM are 24 bits wide, and are measured in 32-bit units, giving us effectively a 26 bit jump space. Relative references -- references to static data, or other procedures -- are 32 bits wide. I certainly hope that four gigabytes in a compilation unit is enough! By the time it is a problem, hopefully we will be doing native compilation.

Well, those are the basics of Guile's VM. There's more to say, but I already linked to the source, so that should be good enough :) In some future dispatch, we'll talk about the other parts of Guile 2.2. Until then!

Syndicated 2013-11-26 22:07:55 from wingolog

scheme quiz time!

Scheme quiz time!

Consider the following two functions:

(define (test1 get)
  (let ((v (make-vector 2 #f)))
    (vector-set! v 0 (get 0))
    (vector-set! v 1 (get 1))
    v))

(define (test2 get)
  (let* ((a (get 0))
         (b (get 1)))
    (vector a b)))

Assume the usual definitions for all of the free variables like make-vector and so on. These functions both create a vector with two elements. The first element is the result of a call to (get 0), where get is a function the user passes in as an argument. Likewise the second comes from (get 1).

(test1 (lambda (n) n)) => #(0 1)
(test2 (lambda (n) n)) => #(0 1)

So the functions are the same.

Or are they?

Your challenge: write a standard Scheme function discriminate that, when passed either test1 or test2 as an argument, can figure out which one it is.

. . .

Ready? If you know Scheme, you should think on this a little bit before looking at the answer. I'll wait.

. . .

OK!

We know that in both functions, two calls are made to the get function, in the same order, and so really there should be no difference whatsoever.

However there is a difference in the continuations of the get calls. In test1, the continuation includes the identity of the result vector -- because the vector was allocated before the get calls. On the other hand test2 only allocates the result after the calls to get. So the trick is just to muck around with continuations so that you return twice from a call to the test function, and see if both returns are the same or not.

(define (discriminate f)
  (let ((get-zero-cont #t)
        (first-result #f))
    (define (get n)
      (when (zero? n)
        (call/cc (lambda (k)
                   (set! get-zero-cont k))))
      n)
    (let ((result (f get)))
      (cond
       (first-result
        (eq? result first-result))
       (else
        (set! first-result result)
        (get-zero-cont))))))

In the call to f, we capture the continuation of the entry to the (get 0) call. Then later we re-instate that continuation, making the call to f return for a second time. Then we see if both return values are the same object.

(discriminate test1) => #t
(discriminate test2) => #f

If they are the same object, then the continuation captured the identity of the result vector -- and if not, the result was only allocated after the get calls.

so what?

Unhappily, this has practical ramifications. In many compilers it would be advantagous to replace calls to vector with calls to make-vector plus a series of vector-set! operations. Such a transformation lowers the live variable pressure. If you have a macro that generates a bison-like parser whose parse table is built by a call to vector with 400 arguments -- this happens -- you'd rather not have 400 live variables in the function that builds that table. But this isn't a safe transformation to make, unless you can prove that no argument captures the current continuation. Happily, for the parser generator macro this is the case, but it's not something to bet on.

It gets worse, though. Since test1 returns the same object, it is possible to use continuations to mutate previous return values, with nary a vector-set! in sight!

(define (discriminate2 f)
  (let ((get-zero-cont #f)
        (escape #f))
    (define (get n)
      (case n
        ((0) (call/cc (lambda (k)
                        (set! get-zero-cont k)
                        0)))
        ((1) (if escape
                 (escape)
                 1))))
    (let ((result (f get)))
      (call/cc
       (lambda (k)
         (set! escape k)
         (get-zero-cont 42)))
      result)))

(discriminate2 test1) => #(42 1)
(discriminate2 test2) => #(0 1)

This... this is crazy.

story time

Now it's story time. Guile has a bytecode VM, and usually all code is compiled to that VM. But it also has an interpreter, for various purposes, and that interpreter is fairly classic: it's a recursive function that takes a "memoized expression" and an environment as parameters. Only, the environment was silly -- it was just a list of values. Before evaluating, a "memoizer" runs to resolve lexical references to indexes in that list, and entering a new lexical contour conses on that list.

Well of course that makes lexical variable lookup expensive. It usually doesn't matter as everything is compiled, but it's a bit shameful, so I rewrote it recently to use two-dimensional environments. Let me drop some ASCII on you:

   +------+------+------+------+------+
   | prev |slot 0|slot 1| ...  |slot N|
   +------+------+------+------+------+
      \/
   +------+------+------+------+------+
   | prev |slot 0|slot 1| ...  |slot N|
   +------+------+------+------+------+
      \/
     ...
      \/
   toplevel

It's a chain of vectors, linked through their first elements. Resolving a lexical in this environment has two dimensions, the depth and the width.

Vectors.

You see where I'm going with this?

I implemented the "push-new-environment" operation as a sequence of make-vector and an eval-and-vector-set! loop. Here's the actual clause that implements this:

(define (eval exp env)
  (match exp
    ...
    (('let (inits . body))
     (let* ((width (vector-length inits))
            (new-env (make-env width #f env)))
       (let lp ((i 0))
         (when (< i width)
           (env-set! new-env 0 i (eval (vector-ref inits i) env))
           (lp (1+ i))))
       (eval body new-env)))
    ...))

This worked fine. It was fast, and correct. Or so I thought. I used this interpreter to bootstrap a fresh Guile compile and all was good. Until running those damned test suites that use call/cc to return multiple times from let initializers, as in my discriminate2 test. While the identity of env isn't visible to a program as such, the ability of call/cc to peel apart allocation and initialization of the environment vector makes this particular implementation strategy not viable.

In the end I'll inline a few arities, and then have a general case that allocates heap storage for the temporaries:

(case (vector-length env)
  ((0) (vector env))
  ((1) (vector env (eval (vector-ref inits 0) env)))
  ...
  (else
   (list->vector
    (cons env
          (map (lambda (x) (eval x env))
               (vector->list inits))))))

Of course I'd use a macro to generate that. It's terrible, but oh well. Es lo que hay.

Syndicated 2013-11-02 13:42:06 from wingolog

es6 generators and iteration in spidermonkey

It is my great pleasure to announce that SpiderMonkey, the JavaScript engine of the Firefox browser, just got support for ES6 generators and iteration!

the chase

As I have written before, generators are functions that can suspend. SpiderMonkey has had generators since Firefox 2 (really!), but the new thing is that now their generators are standard, and available by default. No need to futz with Firefox-specific "language versions": there is just 1JS now.

My last article extolled the virtues of generators for asynchronous programs, and since then there have been a number of great articles and talks around the web on that topic. Generators are a kind of coroutine, and coroutines work well for asynchronous computation.

But that's not all that coroutines are good for. Another great application of generators is in iteration, where you have a producer that knows how to produce values, and a consumer that knows how to process them. Being able to write the producer as a coroutine is great win.

words, words, words

This is a bit abstract, so let me show some code. Let's say you are implementing an auto-complete widget, so you use a trie. Here is a simple trie:

function Trie() {
  this.kids = Object.create(null);
  this.terminal = false;
}

Trie.prototype.insert = function (str) {
  if (str.length == 0) {
    this.terminal = true;
  } else {
    var node = this.kids[str[0]];
    if (!node)
      this.kids[str[0]] = node = new Trie;
    node.insert(str.slice(1));
  }
}

This kind of data structure is nice because it encodes the minimal amount of information. Unlike some of the examples on Wikipedia, the information from the prefix is not contained in the suffix.

Let's add some words:

var db = new Trie;
['fee', 'fie', 'fo', 'foo'].forEach(db.insert.bind(db));

Now, how do we get the completions?

Trie.prototype.completions = function (prefix) {
  // ?
}

It would be nice to avoid making an array of all of the possible completions, because who knows how many we're actually going to show. One of the whole points of using a trie was to avoid storing all of the strings into memory.

Generators offer a nice solution to this problem:

Trie.prototype.strings = function* () {
  if (this.terminal) yield '';
  for (var k in this.kids)
    for (var suffix of this.kids[k].completions(''))
      yield k + suffix;
}

Trie.prototype.completions = function* (prefix) {
  if (prefix.length) {
    var node = this.kids[prefix[0]];
    if (node !== undefined)
      for (var suffix of node.completions(prefix.slice(1)))
        yield prefix[0] + suffix;
  } else {
    yield* this.strings();
  }
}

Here you see that the information that is part of the prefix is now present implicitly, as part of the control stack. Operations on recursive data structures are naturally expressed with recursive functions.

Note that if I call db.strings(), nothing happens -- not even if there are millions of strings in the trie. Values get produced by the coroutine only as the consumer asks for them.

Speaking of consumers, "for-of" is usually the consumer that you want, as it is the built-in ES6 language construct for iteration. For example:

js> for (var word of db.completions('fo')) print(word)       
fo
foo
js> for (var word of db.completions('f')) print(word)  
fee
fie
fo
foo
js> for (var word of db.completions('q')) print(word)
js> 

This code is not just part of some future standard: it's available now in Firefox nightlies, out of the box. It's also available in Chrome, if you enable "experimental features".

niceties

What is also in Firefox, but not yet in Chrome, is support for custom iterators. Wouldn't it be nice if you could just say:

for (var word of db)
  print(word);

Well you can! But here it gets complicated.

The right-hand side of the for-of statement, in this case db, is the iterable. The iterator is derived from the iterable by calling the iterable's special @@iterator method. The @@ indicates that this method is keyed by a "well-known symbol".

Symbols are a new kind of data type in ES6, and their salient characteristic is that they can be used as unique property keys. You can imagine that in the future there is some piece of the JavaScript engine that, on startup, does something like this:

function Symbol() {}
Symbol.iterator = new Symbol;

and then uses that Symbol.iterator value as a property key.

However, Firefox does not implement symbols yet. In the meantime Firefox uses a special string as a stand-in for @@iterator. This name is not publicised anywhere, but you can get the value of @@iterator using this future-proof code:

const iterator = (function() {
  try {
    for (var _ of Proxy.create({get: function(_, name) { throw name; } }))
      break;
  } catch (name) {
    return name;
  }
  throw 'wat';
})();

However2! V8 implements for-of but not the @@iterator protocol, so if you target V8 also, you might find this gist to be useful.

Given iterator, we can now make Trie iterable directly:

Trie.prototype[iterator] = Trie.prototype.strings;

js> for (var word of db)
      print(word);
fee
fie
fo
foo

how did we get here?

That's the status: Firefox does ES6 generators and iterators. Excellent!

So, I have to toot my toot some bells and ring some horns here. I implemented ES6 generators and iteration in SpiderMonkey as part of my work with Igalia, your friendly neighborhood browser consultancy. I did the same in V8 a few months ago.

It seems odd, on the face of it, to hack on core components of two major browser engines, while not being employed by either associated company. We think of Google, Mozilla, Apple, and Microsoft as the ones that are really "driving the web", either because it's strategic for them, or because they are shamed by their competitors into doing so. But that's not the totality of the browser ecosystem.

To develop the web platform, there is a simple condition: you have to ship a browser, or components of a browser (e.g. a JS engine). If you do that, you can usefully invest in improving your tools, to make them better for you in some way. The magical part is that when you improve a browser, you improve the web.

Since three of the four main browser engines are Free Software, it's possible for third parties that ship browsers to meaningfully contribute to them -- and happily, this improves not only their products but also the web as a whole.

So for ES6 generators and iteration, big ups to Bloomberg, who sponsored this work. More expressive language features helps them better develop their products, both on the server and client side. Because they deploy V8 and SpiderMonkey components, they were in this magical position of being able to usefully invest in the web platform, making it better for all of us. Thanks!

Thanks also to the great Mozilla folks, who were very kind, helpful, and forgiving of my mozilla-rookie mistakes. It was also a privilege to be able to get such a close insight into the workings of Firefox development. Thanks especially to Jason Orendorff and Jeff Walden for their detailed patch reviews, and to Till Schneidereit for enduring many stupid questions over IRC :) Y'all are great!

next

I have a richly variegated set of minor bugs to fix up with this work, so while I've still got this all paged in, do be sure to give a try to the Firefox nightlies and see if this works for you. File bugs in Bugzilla and copy me (:wingo) on them. Thanks in advance for submitting bugs there and not in the blog comments :) Happy hacking!

Syndicated 2013-10-07 15:17:56 from wingolog

time for money

Good morning!

This is the third in my series of articles on working for/in/on/at a worker's cooperative. The previous one is here; eventually I will update the first to link to them all.

a message of pottage

Today's article is about compensation! You know, pay, salary, that kind of thing. This is such a strange topic that perhaps you will permit me to wax philosophical for a moment.

Most of us in our salaried lives are accustomed to see the checks coming in, as if it were the natural, normal relationship between human beings: organizations depositing money in people's bank accounts at regular intervals.

However, salary is an epiphenomenon. At the most basic level a business has to get sales and deals coming in, and this is a much more chunky endeavor: a big contract here for 8 people for 6 months, a two-week engagement for one person there, and so on. Whereas we can try to convince ourselves of the reality of "payroll" as a kind of constant flow of expenses, income is more often dealt with particle-by-particle, not as a fluid.

Salary, then, is a kind of damping function between the uncertainties of income and most people's desire for regularity and certainty. It also serves as a way for an owner to hide income from workers. The point of employment is for a business to pay someone to provide value, and that the value be greater than the pay; if most workers realize this, they will (and should!) negotiate up or change jobs.

So the question to ask in a worker-owned, self-managed cooperative is, "should we even have salaries?" If workers are responsible for securing income, and ultimately receive all of the income (minus fixed expenses), could it be that creating the illusion of certainty and predictability via salaries might actually be bad for the business? It could be that salary as a flow abstraction over chunky income would isolate a worker from the needs of the business.

We don't talk about these things much, because we're used to the current situation, which I'll explain below. But I wanted to bring up this point because salary like other aspects of the business isn't really decided a priori; it's all up for collective agreement. Granted, it is hard to get people interested in a topic on which they have already had long arguments, even if that was before you joined the group; perhaps rightfully so. But it is all open.

equity

The theory in Igalia is that we pay everyone the same. Everyone should be working with approximately the same effort, and everyone should be working on something that is valuable to the business (whether in the short-, medium-, or long-term), and so everyone should share in the income to an equal extent.

The reality, as always, is a bit more complicated.

I live in the Geneva area: actually in France, but 10 minutes' walk to the Swiss border. I moved here for my partner, who came here for professional reasons, and though I was very uncomfortable in the beginning, it is growing on me. One of the good things about living here is that I'll never be shocked by any price anywhere ever again. Things are expensive here. Anyone who moves here from elsewhere learns a bit of economy, but that only goes so far; the simple fact is that for a similar quality of life, you pay much more in Geneva than you do in Barcelona.

Finding an equitable solution to this problem is an ongoing topic, though we think we have something workable currently. The basic idea is start from a per-location "base salary", and all aim for a common "target salary".

A base salary corresponds to the amount that a person in (say) San Francisco needs to live, below which they would probably have to find another job. The current way to calculate that is partly a function of rent prices, and partly uses some cost-of-living indexes from numbeo.com. We all agree on the target salary as a number that we'd all be happy with, that could let someone live well anywhere.

Depending on how our income is going, we pay someone their base salary plus a fraction of the difference between the global target and their specific base salary. That fraction (the "bonus percentage") is the same for all workers. In that way, in times of plenty we're all getting the same good amount, corresponding to our equal value to the company, and in times of tight cashflow we all have what we need.

As I mentioned in my last article, Spanish workers are salaried, whereas folks in other places are freelance. Freelancers get an additional 30% or so in their income, corresponding to the amount that Igalia pays in social security and other tax for its Spanish employees.

Amusingly, we can't actually codify the base-and-target regime into the legal salary structure for Spanish workers. In Spain the various industrial sectors have collective bargaining between workers and employers to set minimums on the standard contracts, and the base salary for the IT sector is above a Spanish base salary as currently calculated. So although the usual total salary for a Spanish worker is much higher than the negotiated minimums, we can't actually formulate the contracts in that way. As I understand it, this is an example of the kind of restrictions imposed by being a legal corporation rather than a legal cooperative. The funny thing is that Igalia is not a legal cooperative precisely to allow its workers to live anywhere in the world -- a cooperative has to be 85% owned by its employees -- but naturally Spanish employment law doesn't do anything to help foreign contractors, and being a corporation is inconvenient for an organization that is practically a cooperative.

effort and value

Oddly enough, I don't remember anyone ever saying why it is that we pay everyone the same. "Paying everyone the same" is a strategy, not a principle, and as you see it's not even the strategy that we have, given the cost-of-living adjustments described above.

However, if I have to back-justify a principle -- and I do think it is necessary -- I think it is that we pay according to effort, assuming the work is valuable.

Effort is measured in hours, as recorded in a time tracker. Although the time tracker is useful for "management" if that word can be used in a non-pejorative sense -- "is this person able to spend enough time on commercially useful things, or are other things taking up their time?" -- the primary use of the time tracker is to document the time that someone spends at work, so they can get paid for it, and so that we know that everyone is showing similar effort.

It can happen that a person works too much, and in that case we sum up hours at the end of a year and either arrange for that person to have some paid time off, or pay them an extra month or two. This is a very tricky thing, because 40 hours a week is already quite a lot -- many of us would like to reduce this -- and you don't want someone burning out due to overwork, or spending their time on unproductive things. At the same time, it does happen that people do overtime, so we have to deal with it.

The caveat that "assuming the work is valuable" is important, and it touches on the topic of value. It is the collective responsibility of the business to ensure that people are working on valuable things. There are many kinds of value, and not all of them are measured in euros. For instance, working on and around Free Software is one of our values. Doing technically interesting work is also a value, one that compensates us not only as craftspeople but also as a business, through good marketing.

However, not all technically interesting, Free Software projects are equally valuable to the business. Some have customers that pay more than others, and it's the responsibility of the assembly to ensure that people are generally working on things that make economic sense. That doesn't mean there must be a short-term return; for instance, I've heard of a number of different people working on really advanced projects whose budget came out of the marketing department. Nonetheless, in the end, we have to get people to work on technology that sells.

We don't talk very much about it, but not everyone produces the same value to Igalia, even exhibiting the same effort in valuable work areas, and factoring in the intangible value of some projects. I think the current perspective is that this is a problem that we address through training. This has worked out well in the past, for example building out a formidable WebKit team out of people that were specialized in other technological areas. It is an ongoing experiment, though.

Some people suggest that differing compensations are important in a cooperative as a feedback mechanism and to ensure that the worker is actually producing surplus value for the company. For example, Valve's new employee survival guide describes their strategy:

Unlike peer reviews, which generate information for each individual, stack ranking is done in order to gain insight into who’s providing the most value at the company and to thereby adjust each person’s compensation to be commensurate with his or her actual value.

Valve does not win if you’re paid less than the value you create. And people who work here ultimately don’t win if they get paid more than the value they create. So Valve’s goal is to get your compensation to be “correct.”

While I can appreciate this on a business level, when viewed as a social experiment in post-revolutionary life, this is somewhat unsatisfying. Sure, some kind of signalling around effort is useful to have; for me, all socially valuable effort that my colleagues do is acceptable, from 10h/week to 50h/week, though my personal target is in between. Peer review can be an effective input to an assessment of effort and thus compensation. But I am uncomfortable correlating compensation with value, especially market value. I have more sympathy with the "democratic planning" side of things rather than "market socialism"; see this article by Robin Hahnel for some more cogent thoughts in this regard.

bottom line

It's easy to forget when in the thicket of debates over how best to pay people what a pleasure it is to be able to discuss these things without class antagonisms -- without trying to take advantage of management, because we are the management, or exploit the workers, because they are we too. Good riddance to that taboo about not discussing salaries among co-workers. I don't believe that a perfect compensation plan exists, but I am very happy with our ability to argue about it at any time, without fear of reprisals.

I have a couple more topics to write about in the next post or two, but if there is anything you want me to cover, let me know in the comments. Cheers.

Syndicated 2013-06-25 08:19:02 from wingolog

but that would be anarchy!

Good morning, internets!

This is the second of a series of articles on what it's like to work for/in/with a cooperative; the first one is here. Eventually I'll update the first one to link to the whole series, whereever it goes.

I work for a worker's cooperative, Igalia. This article series is about moving beyond theory to practice: to report on our experience with collective self-determination, for the curious and for the interested. It's a kind of exercise in marketing the revolution :) I hope I can be free from accusations of commercial interest, however; for better or for worse, our customers couldn't care less how we are organized internally.

the ties that bind

The essence of a worker's cooperative is to enable people to make decisions about their work to the degree to which they are affected by those decisions. For decisions affecting the whole group, you need collective deliberation. You could think of it as workplace democracy, if you understand democracy to go beyond simple voting and referenda.

Collective decision-making works, but you need some preconditions -- see for example conditions for consensus, from this excellent article on consensus. More so than any other enterprise, a cooperative needs to have a set of goals or principles that binds all of its members together. In Igalia, we have a rather wordy document internally, but it's basically a commitment to finding a way to let hackers work on interesting things, centered around free software, in a self-organized way. Everything else is just strategy.

structure

There are two more-or-less parallel structures in Igalia: one for decision-making and one for work. There is also a legal structure that is largely vestigial; more on that at the end.

The most important structure in Igalia is the "assembly". It's composed of all workers that have been in Igalia for a certain amount of time (currently 1 year, though there are discussions to shorten that period).

The assembly is the ultimate decision-making body, with power over all things: bank accounts, salary levels, budgets, strategies, team assignments, etc. All company information is open to all assembly members, which currently comprises all of the people in Igalia.

The assembly meets in person every two months at the head office in A Coruña, with the option for people to dial in. Since many of us work in remote locations and just communicate over mail and jabber, it's good to see people in person to renew the kind of personal connections that help make agreement easier. Incidentally, this requirement for group face-to-face meetings influenced the architecture of our head office; there is a dividable room there that can be expanded into a kind of "round table" with microphones at all the posts. You can see a picture of it on the about page.

The in-person assemblies are usually a bit abstracted over day-to-day operations and focus more on proposals that people have brought up or on strategy. Sometimes they are easy and quick and sometimes they go late into the night. The ones associated with longer-term planning like the yearly budgeting assemblies are the longest.

How well does this work, you ask? I would summarize by saying that it works OK. There are so many advantages to collective decision-making that I now take for granted that it is difficult to imagine other ways. However, making decisions is hard on a personal level: it's a challenge to hold all of the ideas in your head at one time, and to feel the right level of responsibility for the success of the business. I'll write another article on this point, I think, because it is also part of the cooperative working experience.

"The assembly" is both the bimonthly meeting and also the body of people. We're all on a mailing list and a jabber channel, which is where a lot of the other day-to-day business decisions get made, like "do we accept this contract", "should we hire people soon", "should we hire X person in particular", etc. However with some 40 people it's tricky to establish an active consensus on a mailing list, so it's usually the people that lead with proposals and actively round up people to help them implement that get things done.

work groups

So that's the power structure in Igalia. However on a day-to-day level, unless a thread is really active on the assembly mailing list, folks just do their jobs. Sounds silly but it has to happen. We're organized into teams, like in a normal company, but without managers -- we also do consensus on a smaller, more informal level within the teams.

Since we're a consulting shop, most people write code all day, but there are also people that primarily focus on other things: sales, team coordination (who's available for what work when?), administrative work, finance and cash-flow monitoring, etc. This broader team allocation is also a result of consensus. (You can see the theme here.) Ideally we rotate around the "coordinator"-type jobs so everyone stays fresh, hacking-wise, and we don't entrench these informal power structures. We've done some of that but could do more.

I've heard people say that "if you don't have a boss, the customer is your boss", but that's simply not true for us in any of the ways that matter. Our working conditions, pay, holidays, hours -- all of this is up to us. Yes, we have to do good work, but that's already an expectation we have of ourselves as hackers. It's a much healthier perspective to consider your customer to be your partner: both providing value for the other. If this isn't the case, we should find other partners, and happily in our particular industry this is a possibility. (More on cooperatives, capitalism, and crisis in a future post.)

legalities

As I said in the beginning, the important thing is that a group share principles, and agree periodically on the strategy to implement them. In that light, the particular legal structure of Igalia is an afterthought, though an important one.

Although Spanish law explicitly provides for cooperatives as a kind of legal entity, Igalia is organized as a limited-liability corporation. The reasons are not entirely clear to me, and periodically come up for debate. One of the issues, though, is that in a cooperative, 85% of your partners have to be Spanish residents, and we did not want that restriction.

Spanish workers are employees of Igalia, and folks outside of Spain are freelancers. However once you've been in the assembly for an amount of time, currently 2 years (way too long IMO), you have the option to become a legal partner in the business, purchasing an equal share of the business at a fixed price. I say "option" but it's also an expectation; the idea is that being a partner is a logical and natural outcome of working at/with/on/in Igalia. We have partners in a number of countries now.

You see my confusion with prepositions, and it's because you have to fit all of these ideas in your head at the same time: working for Igalia as an employee, on it as a project, with it as a partner, at it as a social experiment, etc.

Partners are the legal owners of Igalia, and there are about 30 of them now. They meet every few months, mostly to assess the progression of "prepartners" (those in the assembly but not yet partners, like myself). Ideally they don't discuss other things, and I think that works out in practice. There is a small power differential there between the partners and the assembly. However all the really important things get done in the assembly.

Finally, since Igalia is an S.L. (like an LLC), there is a legal administrator as well -- someone who's theoretically responsible for the whole thing and has to sign certain legal papers. In fact we have three of them, and the positions rotate around every three years. If we were a legal cooperative we could remove this need, which would be convenient. But that's how it is right now.

domination

I want a society without hierarchical power: no state, no military, no boss. But that would be anarchy, right? Well yes, of course that's what it would be! "Anarchy" doesn't equate to a lack of structure, though. It's true that Igalia is embedded in capitalism, but I think it and other cooperatives are a kind of practical anarchy, or at least a step along the path.

I'll close this epistle with a quote, in Chomsky's halting but endearing style. The whole interview is well worth a read.

Anarchism is quite different from that. It calls for an elimination to tyranny, all kinds of tyranny. Including the kind of tyranny that's internal to private power concentrations. So why should we prefer it? Well I think because freedom is better than subordination. It's better to be free than to be a slave. Its' better to be able to make your own decisions than to have someone else make decisions and force you to observe them. I mean, I don't think you really need an argument for that. It seems like ... transparent.

The thing you need an argument for, and should give an argument for, is, How can we best proceed in that direction? And there are lots of ways within the current society. One way, incidentally, is through use of the state, to the extent that it is democratically controlled. I mean in the long run, anarchists would like to see the state eliminated. But it exists, alongside of private power, and the state is, at least to a certain extent, under public influence and control -- could be much more so. And it provides devices to constrain the much more dangerous forces of private power. Rules for safety and health in the workplace for example. Or insuring that people have decent health care, let's say. Many other things like that. They're not going to come about through private power. Quite the contrary. But they can come about through the use of the state system under limited democratic control ... to carry forward reformist measures. I think those are fine things to do. they should be looking forward to something much more, much beyond, -- namely actual, much larger-scale democratization. And that's possible to not only think about, but to work on. So one of the leading anarchist thinkers, Bakunin in the 19th cent, pointed out that it's quite possible to build the institutions of a future society within the present one. And he was thinking about far more autocratic societies than ours. And that's being done. So for example, worker- and community- controlled enterprises are germs of a future society within the present one. And those not only can be developed, but are being developed.

The Kind of Anarchism I Believe In, Noam Chomsky, 28 May 2013

Syndicated 2013-06-13 07:55:24 from wingolog

ecmascript generators from a performance perspective

It's been really gratifying to see the recent interest in ES6 generators in V8. I'm still a Schemer at heart, but one thing all language communities should learn from JS is its enthusiasm and sense of joyful play. One can't spend much time around folks like James Long or Domenic Denicola without feeling the urge to go out and build something fun.

perfie perf perf perf

But this is a blog about nargery! A lot of people have been speculating about the performance impact of using generators, and I've been meaning to write about it, so perhaps this article will tickle your neuron.

A generator is a function that can suspend. So the first perf question to ask yourself is, does the performance of generators matter at all? If your generator spends more time suspended than active, then probably not. Though it's not the only application of generators, if you are using generators for asynchronous programming, then probably generator performance matters not at all.

But that's not very satisfying, right? Even though it probably doesn't matter, maybe it does, and then you need to make sure your mental model of what is fast and what is slow corresponds to reality.

crankshaft

So, the skinny: as implemented in V8, the primary perf impact of generators is that they do not get optimized by Crankshaft.

As you probably know, Crankshaft is V8's optimizing compiler. There are two ways a function can be optimized: on entry, because the function is called many times, or within a loop, because the loop is taking too long. In the first case it's fairly easy: you kick off a parallel task to recompile the function, reading the type feedback collected by the inline caches from the baseline compiler. Once you've made the optimized function, you swap the function's code pointer, and the next time you call the function you get the optimized version instead of the generic version.

The other way is if you are in a function, in a hot loop, and you decide to optimize the function while you are in it. This is called on-stack replacement (OSR), and is important for functions that are only called once but do have hot loops. (Cough KRAKEN cough.) It's trickier to do parallel compilation with OSR, as the function might have left the loop by the time recompilation finishes, but oh well.

So in summary, when V8 goes to optimize a function it will be in one of two states: inactive, because it hasn't started to run yet, or active, because it's stuck in a tight loop.

Generators introduce a new state for functions: suspended. You can know whether a function is active or inactive, but you can't know how many suspended generator activations are out in the JS heap. So you have multiple potential entry points to the function, which complicates the optimizer as it has to work on graphs with irreducible loops. Perhaps it's still possible to optimize, as OSR is a form of irreducible loops; I suspect though that you would end up compiling as many optimized functions as there are yield points, plus one for the function entry (and possibly another for OSR). Anyway it's a bit complicated.

Also, deoptimization (bailout) becomes more difficult if you can have suspended activations, and as a technical detail, it's tricky for V8 to efficiently capture the state of an activation in a way that the garbage collector can understand.

The upshot is that tight loops in generators won't run fast. If you need speed in a loop in a generator, you will need to call out to some other function (which itself would get optimized).

baseline

On the other hand, I should note that generators are optimized for quick suspend and resume. In a generator function, V8 stores local variables on the heap instead of on the stack. You would think this would make things slow, but that is not the case. Access to locals has the same speed characteristics: both stack and heap locals are accessed by index from a register (the stack pointer or the context pointer).

There is an allocation overhead when you enter a generator to create the heap storage, but it is almost never the case that you need to copy or allocate when suspending a generator. Since the locals are on the heap already, there is nothing to do! You just write the current instruction pointer (something you know at compile-time) into a slot in the generator object and return the result.

Similarly, when resuming you have to push on a few words to rebuild the stack frame, but after that's done you just jump back to where you were and all is good.

The exception to this case is when you yield and there are temporaries on the stack. Recall in my article on V8's baseline compiler that the full-codegen is a stack machine. It allocates slots to named locals, but temporary values go on the stack at run-time, and unfortunately V8's baseline compiler is too naive even to know what the stack height is at compile-time. So each suspension checks the stack height, and if it is nonzero, it calls out to a runtime helper to store the operands.

This run-time save would be avoidable in register machines like JavaScriptCore's baseline compiler, which avoids pushing and popping on the stack. Perhaps this note might spur the competitive urges of some fine JSC hacker, to show up V8's generators performance. I would accept JSC having faster generators if that's what it took to get generators into WebKit, at least for a while anyway :)

abstract musings

Many of you have read about the Futamura projections, in which one compiles a program via partial application of an interpreter with respect to a given input program. Though in the real world this does not appear to be a good strategy for implementing compilers, as so much of compilation is not simply about program structure but also about representation, there is a kernel of deep truth in this observation: the essence of compilation is partial evaluation of a strategy with respect to a particular program.

This observation most often helps me in the form of its converse. Whenever I find myself implementing some sort of generic evaluation strategy, there's a sign in my head that start blinking the word "interpreter". (Kinda like the "on-air" signs in studios.) That means I'm doing something that's probably not as efficient as it could be. I find myself wondering how I can make the strategy specific to a particular program.

In this case, we can think of suspending and resuming not as copying a stack frame out and in, but instead compiling specific stubs to copy out only the values that are needed, in the representations that we know that they have. Instead of looping from the bottom to the top of a frame, you emit code for each element. In this way you get a specific program, and if you manage to get here, you have a program that can be inlined too: the polymorphic "next" reduces to a monomorphic call of a particular continuation.

Anyway, Amdahl's law limits the juice we can squeeze from this orange. For a tight for-of loop with a couple of optimizations that should land soon in V8, I see generator performance that is only 2 times slower that the corresponding hand-written iterators that use stateful closures or objects: 25 nanoseconds per suspend/resume versus 12 for a hand-written iterator.

25 nanoseconds, people. You're using jQuery and asking for data from a server in the next country. It is highly unlikely that generators will be a bottleneck in your program :)

Syndicated 2013-06-11 13:48:00 from wingolog

no master

It's difficult for anyone with an open heart and a fresh mind to look at the world without shuddering at its injustices: police brutality, obscene gaps between rich and poor, war and bombs and land mines, people without houses and houses without people. So much wrong! It is right and natural not only to feel revulsion, but to revolt as well. Fight the man! Down with the state! No god, no master!

So to those working against foreclosures and evictions of families in your neighborhoods, more power to you. Fight the good fight.

However, revolt is unfortunately not sufficient. We must also think beyond the now and imagine a better world: life after the revolution.

I'm sure I've lost some of you now, and that's OK. We are all at different points of political consciousness, and you can't speak to all at the same time. Some of the words like "revolution" probably bother some people. It sounds strident, right? But if you agree (and perhaps you do not) that there are fundamental problems with the way the world works, and that symptoms like banks kicking families out of houses using riot police are ultimately caused by those problems, then any real change must also be fundamental: at the foundation, where the roots are. Radical problems need radical solutions. Revolution is no more (and no less!) than a radically different world.

For me, one of these roots to dig up is hierarchy. I take as a principle that people should have power over and responsibility for decisions to the extent that they are affected, and hierarchy is all about some people having more power than others. Even if you do away with capitalism, as in the early Soviet Union, if you don't directly address the issue of hierarchy, classes will emerge again. A world in which some people are coordinators and other people are workers will result in a coordinator class that gradually accretes more and more power. Hierarchy is in essence tyrannical, though any specific instance may be more or less so.

On the other side of things, hierarchy deadens those at the bottom: the listless students, the mechanical workers, the passive and cynical voters. The revolution has to come about in a way that leaves people more alive and curious and energetic, and that seems to correspond not only with greater freedom but also with personal responsibility. I think that a world without hierarchy would be a world whose people would be more self-actualized and self-aware, as they would have more responsibility over the situations that affect their lives and those around them.

Well. I don't want to wax too theoretical here, in what I meant to be an introduction. I've been having these thoughts for a while and finally a couple years ago I decided to join in a collective experiment: to practice my chosen craft of computer programming within a cooperative context. I haven't had a boss since then, and I haven't been the boss of anyone. I'm not even my own boss. There are no masters!

I've gotten so used to this way of working that sometimes I forget how unusual it is to work in a cooperative. Also, we're usually too busy hacking to write about it ;) So in the next few articles I'll take a look at the internal structure of our company, Igalia, and try to give some insight into what it's like working in a cooperative.

Syndicated 2013-06-05 13:35:31 from wingolog

409 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!