Older blog entries for wingo (starting at number 400)

corps: bespoke text codecs

Happy 12/12/12, peoples! In honor of repeated subsequences, today I'm happy to release a new set of compression tools, Corps.

Corps is a toolkit for generating custom text codecs, specialized to particular bodies of text. You give it an example corpus to analyze, and Corps can generate codecs based on what it finds.

For example, if you want to develop a compression algorithm that operates on JavaScript source text, you probably want to use a special code to represent the multi-character sequence function.

I say "probably", because in reality you don't know what substrings are most common. Corps uses the Re-Pair algorithm to build up an optimal token set. This algorithm treats all characters in the input as tokens, and recursively creates composite tokens from the most common pair of adjacent tokens, repeating the process until there are no more repeated pairs of tokens. For full details, see "Off-Line Dictionary-Based Compression" by Larsson and Moffat.

Corps is mostly useful for creating special-purpose fixed codecs based on a dictionary generated by its offline analysis. Although pre-generated codecs often do not compress as well as adaptive codecs like the one used by gzip, fixed codecs are typically faster as they can use the fixed code table to generate optimized encoders and decoders. Corps currently generates encoders in C and Scheme, and decoders in C, Scheme, and JavaScript.

Special-purpose codecs can also provide some interesting properties, such as the ability to decompress a substring of the original text.

get source

Corps is written for Guile version 2.0. See http://gnu.org/s/guile for more information on Guile and how to install it on your system.

To build Corps from git, do:

git clone git://gitorious.org/corps/corps.git
cd corps
./autogen.sh && ./configure && make && make check

You can install using make install, but it's often more convenient to just run corps uninstalled, using the env script.

stinky cheese

Corps includes a simple command-line tool called corps. For full documentation, run corps help. You can run it uninstalled by prefixing the ./env from the build tree.

As an example, say you want to build a database of wikipedia pages on cheese. Let's fetch a page:

$ curl -s 'http://en.wikipedia.org/wiki/Maroilles_(cheese)' > cheese
$ ls -l cheese
-rw-r--r-- 1 wingo wingo 43123 Dec 12 15:12 cheese

Now we analyze it to determine common substrings:

./env corps extract-all cheese > cheese-tokens

This generates a list of (string,frequency) pairs. An extract-all token set is usually quite large. We can pare it down to something manageable, the 500 most common substrings:

./env corps extract-subset cheese-tokens 500 cheese > 500-tokens

With this dictionary, we can huffman-code the page:

$ ./env corps encode -t 500-tokens -m huffman cheese cheese.huff
$ ls -l cheese.huff
-rw-r--r-- 1 wingo wingo 18060 Dec 12 16:09 cheese.huff

Well that's pretty cool -- it's less than half the size of the source text. We can also pare down this set of tokens to an appropriate number of tokens for a bytecode, and try doing a byte-encode of the file:

$ ./env corps extract-subset 500-tokens 254 cheese > 254-tokens
$ ./env corps encode -t 254-tokens cheese > cheese.bc
$ ls -l cheese.bc
-rw-r--r-- 1 wingo wingo 22260 Dec 12 16:19 cheese.bc

It's larger than the huffman-code, not only because of the smaller dictionary, but also because a bytecode is less dense. In practice though a bytecode is good enough while being very fast, so we'll continue with the bytecode.

Now let's generate a C encoder and decoder for this token set.

$ ./env corps generate-c-byte-encoder 254-tokens > encoder.inc.c
$ ./env corps generate-c-byte-decoder 254-tokens > decoder.inc.c
$ cp ~/src/corps/corps/decoder.c .
$ cp ~/src/corps/corps/encoder.c .
$ gcc -o decoder -O3 decoder.c
$ gcc -o encoder -O3 encoder.c
$ ls -l encoder decoder
-rwxr-xr-x 1 wingo wingo 13192 Dec 12 16:23 decoder
-rwxr-xr-x 1 wingo wingo 31048 Dec 12 16:23 encoder

Nice! We could use the corps tool to decode cheese.bc, but to vary things up we can use our zippy C decoder:

$ ./decoder < cheese.bc > cheese.out
$ cmp cheese.out cheese && echo 'excellent!'
excellent!

The performance of the C encoder is pretty good:

$ for ((x=0;x<1000;x++)) do cat cheese >> megacheese; done
$ time gzip -c < megacheese > megacheese.gz
real	0m1.418s
user	0m1.396s
sys	0m0.016s
$ time ./encoder < megacheese > megacheese.bc
real	0m0.523s
user	0m0.480s
sys	0m0.044s
$ ls -l megacheese*
-rw-r--r-- 1 wingo wingo 43123000 Dec 12 17:03 megacheese
-rw-r--r-- 1 wingo wingo 22370002 Dec 12 17:04 megacheese.bc
-rw-r--r-- 1 wingo wingo 11519311 Dec 12 17:04 megacheese.gz

Gzip gets better compression results for many reasons, but our quick-and-dirty bytecode compressor does well enough and is quite fast. The decoder is also quite good:

$ time ./decoder < megacheese.bc > /dev/null
real	0m0.179s
user	0m0.160s
sys	0m0.016s
$ time gunzip -c < megacheese.gz > /dev/null
real	0m0.294s
user	0m0.284s
sys	0m0.008s

Amusingly, for this text, gzipping the bytecoded file has quite an impact:

$ gzip -c < megacheese.bc > megacheese.bc.gz
$ ls -l megacheese*
-rw-r--r-- 1 wingo wingo 43123000 Dec 12 17:03 megacheese
-rw-r--r-- 1 wingo wingo 22370002 Dec 12 17:04 megacheese.bc
-rw-r--r-- 1 wingo wingo   175246 Dec 12 17:12 megacheese.bc.gz
-rw-r--r-- 1 wingo wingo 11519311 Dec 12 17:04 megacheese.gz

It so happens that byte-compressing the original text allows it to fit within the default gzip "window size" of 32 KB, letting gzip detect the thousandfold duplication of the source text. As a codec that works on bytes, gzip tends to work quite well on bytecoded files, and poorly on bit-coding schemes like huffman codes. A gzipped bytecoded file is usually smaller than a gzipped raw file and smaller than a gzipped huffman-coded file.

hack

I'll close with a link to the 254 most common substrings in one corpus of JavaScript code that I analyzed: here. I have a set of patches to integrate a codec optimized for JavaScript source into V8, to try to reduce the memory footprint of JS source code. More on that quixotic endeavor in some future post. Until then, happy scheming!

Syndicated 2012-12-12 16:57:56 from wingolog

geneva

silence on the wire

It's been pretty quiet in this electro-rag, and for a simple but profound reason. I don't know about all of Maslow's hierarchy of needs, but for me there is definitely a hierarchy between housing and anything else.

Thing is, my partner and I drove up to Geneva almost three months ago, but we only moved in somewhere permanent last week. Everything was up in the air! Sublets and couches and moving vans and storage units. All the things in all the places.

In a way it should have been liberating, but I reckon I'm just a homebody: someone who likes to have a spritual and physical base to come back to. It's the classic introverted/extroverted "where do you get your energy" thing -- in private, or in society? Of course it's both, but for me the private side is important and necessary.

society, nations, states

Incidentally, October made ten years since I left the US. I lived in Namibia for a couple of years, moved to Spain in 2005, and as you know, left for Switzerland in September. In that time, a third of my life, I have not been able to vote for my local government. Representative democracy isn't very democratic, of course, but not even having that nominal connection to collective decision-making is a bit unmooring.

In Spain I have what is called "EU long-term residency". In 2003, the EU standardized visas for "third-country nationals": folks like me that do not have citizenship in a EU country. After 5 years, you automatically get country-specific "long-term residency", and can optionally apply for "EU long-term residency" (as I just did), which has the additional benefit of being transferable to other EU countries.

The idea of EU long-term residency was to make third-country nationals have the same rights as EU citizens, but several restrictions placed on the agreement by individual nations make it not very attractive compared to full nationality. EU citizens have the right to move around the EU, whereas third-country nationals have to apply to do so.

And in many countries, you can get nationality just as easy as long-term residency. In France for example, you only have to wait 5 years to be able to apply for nationality. In Spain, if you come from a Latin American country, you only have to wait 2 years (in theory; I have heard of unreasonable delays). As that is not my particular case, I would have to wait 10 years to get Spanish nationality, making this "EU long-term residency" attractive. However if I move away from Spain officially, I would have to "start over".

Man. The EU. What a bizarre institution. I think few normal citizens can describe the structure of the the EU -- you know, what's the deal with the parliament and the commisioners and the president and the central bank? The operations of the EU have very little to do with democracy. The tenuous accountability of the Members of European Parlaiment is mostly to political parties rather than to the people. Finance in rich countries basically runs the bank. The commission seems to run itself, with limited accountability to the governments of member states. To the extent that the EU is a democracy, it is its weakest form: vast, physically removed from its constituency, composed of representatives of representatives.

and yet

It's been interesting to see the contrast in Switzerland. Switzerland has its own currency, protectionist trade policies, and a smattering of EU-à-la-carte. So, yes, it is relatively easy for EU citizens to move to Switzerland, but on the other hand it's quite difficult if you are a "third-country national" like myself.

Switzerland is also very local. Most legislation involves the citizens directly, via referendum. The other day, voters here just approved a new constitution, by simple majority, for Geneva. Contrast this to the EU, which centralizes its power more every year.

There are good things and bad things about this. I suppose that specifically, it's mostly good for the Swiss, neutral for EU nationals, and worse for third-country nationals. Policies that enable local agricultural and industrial production are great for local workers, businesses, consumers (with Swiss salary), and the environment. These policies are quite difficult to have in the EU. Even the existence of the Swiss franc is great for local decision-making power, although its policies are mostly controlled by finance.

On the other hand, the immigration policy is quite reactionary, often bordering on xenophobia. In the EU, the institutions were able to achieve some limited forms of freedom of movement of persons, including third-country nationals, in exchange for the freedom of movement of capital. No such arrangement has been made in Switzerland. (They'll happily take your capital, of course.)

As you probably know, I mostly identify as an anarchist, a kind of libertarian socialism that is suspicious of hierarchical power structures. So Switzerland has its attractions in that sense. But I've had discussions with folks arguing that the EU has actually been the emancipatory institution, in contrast to reactionary governments, and there is something there. For all its decentralized, democratic principles, women did not get the right to vote in Swiss federal elections until 1971, and the last canton (local government) held out until 1990, when it was forced to allow women to vote by a federal court.

capital

There's no way around it: Geneva stinks of money.

An illustration, if you don't mind. When we first came here to visit in August, we got out of the car, went to a cash machine, and tried to get out 120 francs (approximately 120 dollars, or 100 euros). No. The minimum amount was 100 francs, and the next was 200. Well, OK. We got 200. It came out in two bills. The machine did not dispense any lower denomination. There was nothing wrong with that machine; it was marked as only dispensing 100-franc bills and higher.

On the other hand, unlike Spain, all shopkeepers carry around fat bankrolls and are quite happy to break a 100, even on a 50 cent purchase. You can do that only if you can find something that costs that little, of course.

If you walk around the center of town, it's designer tailors, caviar bars (not kidding), and private banks. It is truly disgusting. It makes me think of a butcher's counter: bleach covering a faint smell of blood, but without the redeeming earthiness. Perhaps this is just a personal analogy though.

That's the macro-level. Of course there are normal folks here too, but that's tempered by the cost: things are in general really expensive here. Starbucks drip coffee for 5 dollars. You can easily spend 70 dollars a person at a normal restaurant. A normal one-hour tram ticket is 5 dollars. Etc.

From what I can tell, the cost of things is not a problem if you have a Swiss salary. I'm in something of an irregular state: an American with Spanish residency, working for a Spanish company, living in Geneva (but actually France). Administrativia aside, it has been quite a shock coming from Spain to here. In Igalia, the cooperative I work at, we try to pay everyone the same when everything is going well, but when the budget is a little tighter salaries get scaled down to something approaching "the cost of living" (whatever that is).

We had to open a whole new round of discussions about how to determine compensation after I moved here. Geneva just didn't fit in anyone's conception about what was reasonable! It raises all kinds of issues: what does it mean to work in a cooperative with people in all different kinds of places? What does fair and equal compensation actually mean in San Francisco and A Coruña and Madrid and Geneva, and how do you calculate it? It looks like we've come to an interesting and probably unique solution there, so perhaps more on that as discussions progress.

borders

Of course it's strange to come to this capital of, well, capital with these values, but here I am. You end up talking about money a lot. Of course you need a place to put your money and take it out. For an anarchist I've got a lot of bank accounts: Spanish, Swiss, French, and US somewhere... For all the new EU regulations, cross-border ATM fees still make it attractive to have an account in the place where you need to withdraw money.

Same thing with mobile phone companies. As I said, we ended up moving to France. The rent is a lot cheaper, it's more compatible with our residency permits, and it's still only a 25 minute bike or tram into the center of Geneva so it's not totally in the boon-docks. But this makes me carry around two phones because I cross the border all the time, and then of course there's the old Spanish SIM I need to do something with. I've done the PIN-to-PUK dance multiple times, because I can't remember so many numbers.

It's a pretty strange identity to have: to have a house in France, but feel attached to Switzerland. It's tough to catch the dominant social story, of who is the "us" in this place. It's problem with Geneva in general, transient city that it is.

greenery

Well, this post grows long, and it's mostly a rant, right? And I needed to rant and be done with it. But I don't want to be too negative. We have an actual house, with a garden, and we're going to plant things and compost and such. It's warm and cozy inside and there are snowy mountains about, and there's a cosmopolitan city within a close if brisk bike ride. I have a French grocery store five minute's walk away. It's a dark season, but there is cheese and wine enough to carry me through to springtime :) So things are good. I'll still rant, but things are all right.

Bon. Catch you internauts later. Next time, with code!

Syndicated 2012-12-05 23:16:13 from wingolog

quasiconf 2012: lisp @ froscon

Are you a Lisper, in the big-tent sense of the term? You within a day's transport of St. Augustin in Germany? Well shucks, have I got a thing for you: Quasiconf, a lisp sub-conference of FrOSCon, held next weekend (25-26 August 2012).

The full program isn't quite out yet, but they invited yours truly to talk there, and talk I will. In fact I'll talk thrice: they gave me 90 minutes, so I'll do three little presentations.

In the first, I'll finally give that talk I've been threatening to do for years, about why delimited continuations are the bee's knees, how they can be implemented, and what kind of lovely code they enable. The example will be a small memcached server and client.

Next I'll talk about things that implementing JavaScript has taught me about Scheme implementations. They were very surprising lessons to me, so I hope they will entertain the attendees. No spoilers, though!

Finally, now that Guile 2.0 has been out for a year and a half or so, it's a good time to look back at what worked well and what didn't. If you are even remotely interested about what it takes to maintain and enhance a mature language implementation, this will be an interesting talk for you. I'll also take the opportunity to summarize the progress on the development branch, which will probably be Guile 2.2.

So that's the thing. It's somewhat of a late announcement, but hey, if you can go, it's pretty much free and should be an interesting get-together. See you there!

Syndicated 2012-08-15 21:34:25 from wingolog

lakewards

Time passes! And it takes us along with it: now a lazy float, now the running rapids, now an eerie calm.

And now, for me, a rising white noise of waterfall. In a month my partner and I move to Geneva. It's exciting and terrifying and anxiety-producing, but the nice thing about time is that I know it will carry me over the stress of, you know, learning French and such things.

I have loved my time in Barcelona, since moving here in 2005. The city has been pretty good to me. While I am indeed ready to go and try new things, I won't be leaving it without regret.

As a practical matter, I'll be (quite happily) staying on with Igalia, in their compilers group. Practically speaking, my move doesn't change much, work-wise; I've always met with customers over the net or on-site, never in Barcelona itself.

There are loads of practicalities to sort out, but if you have any knowledge about the town, I'm all digital ears. Also, if you happen to need some responsible caretakers for your alpine villa, do let me know. It doesn't even have to have a hot tub. I'm not particular. Just sayin'.

Syndicated 2012-08-15 21:10:07 from wingolog

inside javascriptcore's low-level interpreter

Good day, hackers! And hello to the rest of you, too, though I fear that this article isn't for you. In the vertical inches that follow, we're going to nerd out with JavaScriptCore's new low-level interpreter. So for those of you that are still with me, refill that coffee cup, and get ready for a look into a lovely hack!

hot corn, cold corn

Earlier this year, JavaScriptCore got a new interpreter, the LLInt. ("Low-level interpreter", you see.). As you probably know, JavaScriptCore is the JavaScript implementation of the WebKit project. It's used in web browsers like Safari and Epiphany, and also offers a stable API for embedding in non-web applications.

In this article, we'll walk through some pieces of the LLInt. But first, we need to describe the problem that the LLInt solves.

So, what is the fundamental problem of implementing JavaScript? Of course there are lots of things you could say here, but I'm going to claim that in the context of web browsers, it's the tension between the need to optimize small amounts of hot code, while minimizing overhead on large amounts of cold code.

Web browsers are like little operating systems unto themselves, into which you as a user install and uninstall hundreds of programs a day. The vast majority of code that a web browser sees is only executed once or twice, so for cold code, the name of the game is to do as little work as possible.

For cold code, a good bytecode interpreter can beat even a very fast native compiler. Bytecode can be more compact than executable code, and quicker to produce.

All of this is a long introduction into what the LLInt actually is: it's a better bytecode interpreter for JSC.

context

Before the introduction of the LLInt, the so-called "classic interpreter" was just a C++ function with a bunch of labels. Every kind of bytecode instruction had a corresponding label in the function.

As a hack, in the classic interpreter, bytecode instructions are actually wide enough to hold a pointer. The opcode itself takes up an entire word, and the arguments also take up entire words. It is strange to have such bloated bytecode, but there is one advantage, in that the opcode word can actually store the address of the label that implements the opcode. This is called direct threading, and presumably its effects on the branch prediction table are sufficiently good so as to be a win.

Anyway, it means that Interpreter.cpp has a method that looks like this:

// vPC is pronounced "virtual program counter".
JSValue Interpreter::execute(void **vPC)
{
  goto *vPC;
//...
  // add dst op1 op2: Add two values together.
op_add:
  {
    int dst = (int)vPC[1];
    int op1 = (int)vPC[2];
    int op2 = (int)vPC[3];

    fp[dst] = add(fp[op1], fp[op2]);

    vPC += 4;
    goto *vPC;
  }

  // jmp offset: Unconditional branch.
op_jmp:
  {
    ptrdiff_t offset = (ptrdiff_t)vPC[1];
    vPC += offset;
    goto *vPC;
  }
//...
}

It's a little different in practice, but the essence is there.

OK, so what's the problem? Well, readers who have been with me for a while might recall my article on static single assignment form, used as an internal representation by compilers like GCC and LLVM. One conclusion that I had was that SSA is a great first-order IR, but that its utility for optimizing higher-order languages is less clear. However, this computed-goto thing that I showed above (goto *vPC) is a form of higher-order programming!

Indeed, as the GCC internals manual notes:

Computed jumps contain edges to all labels in the function referenced from the code. All those edges have EDGE_ABNORMAL flag set. The edges used to represent computed jumps often cause compile time performance problems, since functions consisting of many taken labels and many computed jumps may have very dense flow graphs, so these edges need to be handled with special care.

Basically, GCC doesn't do very well at compiling interpreter loops. At -O3, it does get everything into registers on my machine, but it residualizes 54 KB of assembly, whereas I only have 64 KB of L1 instruction cache. Many other machines just have 32 KB of L1 icache, so for those machines, this interpreter is a lose. If I compile with -Os, I get it down to 32 KB, but that's still a tight fit.

Besides that, implementing an interpreter from C++ means that you can't use the native stack frame to track the state of a computation. Instead, you have two stacks: one for the interpreter, and one for the program being interpreted. Keeping them in sync has an overhead. Consider what GCC does for op_jmp at -O3:

op_jmp:
    mov    %rbp,%rax
    ; int currentOffset = vPC - bytecode + 1
    sub    0x58(%rbx),%rax
    sar    $0x3,%rax
    add    $0x1,%eax
    ; callFrame[CurrentOffset] = currentOffset
    mov    %eax,-0x2c(%r11)

    ; ptrdiff_t offset = (ptrdiff_t)vPC[1]
    movslq 0x8(%rbp),%rax

    ; vPC += offset
    lea    0x0(%rbp,%rax,8),%rbp

    ; goto *vPC
    mov    0x0(%rbp),%rcx
    jmpq   *%rcx

First there is this strange prelude, which effectively stores the current vPC into an address on the stack. To be fair to GCC, this prelude is part of the DEFINE_OPCODE macro, and is not an artifact of compilation. Its purpose is to let other parts of the runtime see where a computation is. I tried, but I can't convince myself that it is necessary to do this at the beginning of every instruction, so maybe this is just some badness that should be fixed, if the classic interpreter is worth keeping.

The rest of the opcode is similar to the version that the LLInt produces, as we will see, but it is less compact.

the compiler to make the code you want

The goal of compilation is to produce good code. It can often be a good technique to start from the code you would like to have, and then go backwards and see how to produce it. It's also a useful explanatory technique: we can take a look at the machine code of the LLInt, and use it to put the LLInt's source in context.

In that light, here is the code corresponding to op_jmp, as part of the LLInt:

llint_op_jmp:
    ; offset += bytecode[offset + 1]
    add    0x8(%r10,%rsi,8),%esi
    ; jmp *bytecode[offset]
    jmpq   *(%r10,%rsi,8)

That's it! Just 9 bytes. There is the slight difference that the LLInt moves around an offset instead of a pointer into the bytecode. The whole LLInt is some 14 KB, which fits quite confortably into icache, even on mobile devices.

This assembly code is part of the LLInt as compiled for my platform, x86-64. The source of the LLInt is written in a custom low-level assembly language. Here is the source for the jmp opcode:

_llint_op_jmp:
    traceExecution()
    dispatchInt(8[PB, PC, 8])

You can define macros in this domain-specific language. Here's the definition of dispatchInt:

macro dispatchInt(advance)
    addi advance, PC
    jmp [PB, PC, 8]
end

At this point it's pretty clear how the source corresponds to the assembly. The WebKit build system will produce a version of the LLInt customized for the target platform, and link that interpreter into the JavaScriptCore library. There are backend code generators for ARMv7, X86-32, and X86-64. These interpreters mostly share the same source, modulo the representation differences for JSValue on 32-bit and 64-bit systems.

Incidentally, the "offlineasm" compiler that actually produces the native code is written in Ruby. It's clear proof that Ruby can be fast, as long as you don't load it at runtime ;-)

but wait, there's more

We take for granted that low-level programs should be able to determine how their data is represented. C and C++ are low-level languages, to the extent that they offer this kind of control. But what is not often remarked is that for a virtual machine, its code is also data that needs to be examined at runtime.

I have written before about JSC's optimizing DFG compiler. In order to be able to optimize a computation that is in progress, or to abort an optimization because a speculation failed -- in short, to be able to use OSR to tier up or down -- you need to be able to replace the current stack frame. Well, with the classic interpreter written in C++, you can't do that, because you have no idea what's in your stack frame! In contrast, with the LLInt, JSC has precise control over the stack layout, allowing it to tier up and down directly from the interpreter.

This can obviate the need for the baseline JIT. Currently, WebKitGTK+ and Apple builds still include the baseline, quick-and-dirty JIT, as an intermediate tier between the LLInt and the DFG JIT. I don't have sure details about about long-term plans, but I would speculate that recent work on the DFG JIT by Filip Pizlo and others has had the goal of obsoleting the baseline JIT.

Note that in order to tier directly to the optimizing compiler, you need type information. Building the LLInt with the DFG optimizer enabled causes the interpreter to be instrumented to record value profiles. These profiles record the types of values seen by instructions that load and store values from memory. Unlike V8, which stores this information in executable code as part of the inline caches, in the LLInt these value profiles are in non-executable memory.

Spiritually speaking, I am all for run-time code generation. But it must be admitted that web browsers are a juicy target, and while it is unfair to make safe languages pay for bugs in C++ programs, writable text is an attack surface. Indeed in some environments, JIT compilation is prohibited by the operating system -- and in that sort of environment, a fast interpreter like the LLInt is a big benefit.

There are a few more LLInt advantages, like being able to use the CPU's overflow flags, doing better register allocation, and retaining less garbage (JSC scans the stack conservatively). But the last point I want to mention is memory: an interpreter doesn't have to generate executable code just to display a web page. Obviously, you need to complement the LLInt with an optimizing compiler for hot code, but laziness up front can improve real-world page load times.

in your webs, rendering your kittens

So that's the LLInt: a faster bytecode interpreter for JSC.

I wrote this post because I finally got around to turning on the LLInt in the WebKitGTK+ port a couple weeks ago. That means that in September, when GNOME 3.6 comes out, not only will you get an Epiphany with the one-process-per-tab WebKit2 backend, you also get cheaper and faster JavaScript with the LLInt. Those of you that want the advance experience can try it out with last Monday's WebKitGTK+ 1.9.4 release.

Syndicated 2012-06-27 11:25:30 from wingolog

dltool mines dwarf

This is going to sound like quite a yak-shave, but here goes: I was spending some research time here at Igalia working on a new virtual machine for Guile when I got interested by DWARF, the debugging format used in many UNIX systems (GNU, the BSDs, Mac OS). For various reasons, we needed a new debugging format, and DWARF seemed suitable.

I was going to need tools that would both produce and consume DWARF information, so I started with the parser. Of course there is a ready source of DWARF information on a GNU system, in all of the shared libraries. So I wrote a tool to snarf information out of those libraries. It is suprising how much there is!

dltool

Enough of the strange introduction! Let's try it out:

$ dltool print-one libc wmemchr
(subprogram
  (name "wmemchr")
  (prototyped #t)
  (type (pointer-type
          (byte-size 8)
          (type (named-type-reference (typedef "wchar_t")))))
  (formal-parameter
    (name "s")
    (type (pointer-type
            (byte-size 8)
            (type (const-type
                    (type (named-type-reference (typedef "wchar_t"))))))))
  (formal-parameter
    (name "c")
    (type (named-type-reference (typedef "wchar_t"))))
  (formal-parameter
    (name "n")
    (type (named-type-reference (typedef "size_t")))))

Nifty, no? It even prints the names of the formal parameters. The structure of the printout mirrors the structure of the DWARF information quite closely, so for a detailed reference you should refer to the DWARF standard. Note though that the "type" child of a "subprogram" (procedure, function, etc) denotes its return type. If it's missing, the function does not return a value.

In this case we see that wmemchr returns a pointer to a wchar_t, and we see that the pointer is 8 bytes wide. We don't see the definition of wchar_t though. You can pass a --depth=N to see more information, recursively, but it's most useful to simply have the tool print out a tree of needed declarations, this time with the print-decls command instead of print-one:

$ dltool print-decls libc wmemchr
(typedef
  (name "wchar_t")
  (type (base-type
          (byte-size 4)
          (encoding signed)
          (name "int"))))
(typedef
  (name "size_t")
  (type (base-type
          (byte-size 8)
          (encoding unsigned)
          (name "long unsigned int"))))
(subprogram
  (name "wmemchr")
  ...)

Here the body of the subprogram is the same as before. If I leave off the wmemchr argument, I get all the declarations that are publically exported by libc, and that have debugging information.

locating debug information

In the examples above, I'm just passing the basename of the library. dltool is smart enough to look in the library paths if needed, via parsing /etc/ld.so.conf and $LD_LIBRARY_PATH.

These days it is not so common for a stock GNU distribution to have debugging symbols installed for its libraries. Instead the debugging information is packaged separately, and a .gnu_debuglink section is left in the main binary (library, or executable, or both -- have you ran /lib/ld-linux.so.2 lately?). dltool can deal with that.

The only thing it doesn't handle is compressed debug info (.zdebug_info sections). Hopefully these sections go away at some point though: they prevent debuggers from mapping the debugging information directly, they don't save any distribution cost (packages being zipped already), and have no other runtime impact, as they debuginfo isn't normally loaded.

c++?!?!

Oddly enough, dltool works with C++ too. In this case, I point it at a debugging version of the JavaScriptCore library, looking for a definition of the JSC::JSCell class.

$ dltool print-one --grovel .libs/libjavascriptcoregtk-3.0.so JSC::JSCell
(class-type
  (specification
    (named-type-reference
      (structure-type "JSCell")
      (namespace "JSC")))
  (byte-size 16)
  (member
    (name "TypedArrayStorageType")
    (type (const-type
            (type (named-type-reference
                    (enumeration-type "TypedArrayType")
                    (namespace "JSC")))))
    (declaration #t)
    (const-value 0))
  (member
    (name "m_classInfo")
    (type (pointer-type
            (byte-size 8)
            (type (const-type
                    (type (named-type-reference
                            (structure-type "ClassInfo")
                            (namespace "JSC")))))))
    (data-member-location ((plus-uconst 0)))
    (accessibility private))
 ...)

In this case I used the --grovel option to indicate that I wanted to traverse all the debugging entries to find the one I wanted, instead of looking in the symbol table. That was necessary in this case because type definitions aren't present in a symbol table. The actual result is some 1500 lines long, with member variables and functions, and of course it's not complete as it's just a print-one output.

If the data-member-declaration entry looks a little odd to you, that's because it is odd: it's a sequence of opcodes for a virtual machine that computes locations of values. DWARF specifies a couple of little machines like this. For class members, there's usually only the one plus-uconst opcode, but with bitfields, different opcode sequences are possible.

challenges

Making dltool was quite an interesting hack. Oddly, writing the DWARF parser was the easy part. I had an ELF parser and linker already, and although quite flexible, DWARF is regular and well-specified.

One tricky part was making dltool fast, while not consuming too much memory. There can be a lot of data in a debugging object. For example, the debugging version of libwebkitgtk-3.0.so is bigger than 1GB. In a number of cases, we have to trade off the cost of re-parsing versus the cost of storing parsed data in memory. dltool can indeed handle WebKit, which is something I'm proud of, though printing all the exported declarations takes 2 GB of memory and some three or four minutes. More modest tasks like describing libcairo only take a second or two.

Another tricky part was determining which types are actually the same. In C and C++, typically you compile each file on its own, and then link them together. Each time you compile a file, the compiler loads the headers that the file needs and creates instances of those types, internal to the compiler. The debugging information that the compiler writes can contain as many (or more) definitions of (for example) size_t as there are compilation units that reference it. dltool needs to unify those types that are the same, not only to prevent multiple type declarations, but also to prevent divergence when processing recursive types.

C++'s nested namespaces also presented a problem. DWARF information is organized in trees, though they can also be reference each other by offset. However, there are no up-pointers in the tree nodes. It seems that you need to pre-traverse the node trees to determine the ranges of the namespaces so you can determine the right context for a node at any given offset. C does not have this issue, partly because it is less expressive, and partly due to some DWARF design decisions.

dltool doesn't currently demangle C++ names, for what that's worth.

Oddly, C++ templates were not very much of a problem. They simply create debugging entries with really crazy names, that's all.

"automatic" ffi?

I think dltool is pretty cool. Given an opaque information-ful object, I always want some tool to spread information on a screen in a structured way. So it's an interesting tool for binary spelunkers.

There are some obvious applications, though. One is to use the information that dltool finds to automatically generate bindings from C (or even C++) libraries to your language of choice. You can use a dltool run to generate the bindings at "compile-time", or save it to a file and generate bindings at run-time with node-ffi or ctypes or whatever the kids use these days.

Of course in Guile, besides importing the dlhacks module and creating bindings at runtime, we can get the best situation by running the dltool groveller from within a macro. That way we pull the DWARF groveller into our language, without having the module depend on dlhacks or on the presence of debugging information at runtime.

There are some limitations, though. DWARF is a flexible and extensible format, but it doesn't currently get all of the information you would want in a binding. For example there's no way to tell the C compiler to serialize an association between "struct ClutterActor" and a GType named "ClutterActor". You can't tell it that the return value of wcsdup is actually owned by the caller. You can't annotate pointer arguments to indicate that they are "out" or "in/out" or whatever. But this is not a failing of DWARF: it is a current limitation of our toolchain.

Actually it's also a limitation of C -- I mean, everything you would want to say about a variable or a function or whatever is, loosely speaking, the type of that object. We can't really change C's type system, but we can improve our toolchain by allowing for GCC to recognize and plumb through more relevant __attribute__ annotations.

all right, all ready

Anyway, that's the thing! The tool itself requires Guile 2.0. If you're running Debian, just install guile-2.0, then:

git clone git://gitorious.org/guile-dlhacks/guile-dlhacks.git
cd guile-dlhacks
autoreconf -vif && ./configure && make
./env dltool help

Visit the gitorious page for more, and find me on #guile for more.

Happy spelunking!

Syndicated 2012-06-19 15:55:52 from wingolog

rococo, and then rubble

Why is it that we in GNOME came to view GLib as the lowest level of our stack?

Forget, for the moment, about such small barbarisms as gint32, and even gint. Forget things like gnome-keyring versus GnuPG. Let us even pass over GObject as a whole. Today I had a thought that was new to me. Let's talk about DWARF.

DWARF describes ABIs. It is ubiquitous: it is on GNU systems as well as Mac and BSD, and could be on Windows if we put it there. It's extensible. It is mostly controlled by like-minded free software people.

So why is it that we invented GIR?

Discuss.

Syndicated 2012-06-08 14:31:38 from wingolog

10 years of wingolog

Greetings, friends!

This piece is even more navel-gazing than most, so if that's not your bag, surf on, dear surfer, surf on.

Tomorrow ten years ago I wrote:

What up ya'll. I've been looking at advogato recently, and seeing my co-hackers thomasvs, Uraeus, and hadess posting here all the time made me jealous, so here I am too.

summertime... and the living is easy.

Ten years! There are few things that I have done consistently for ten years, but typing at this end of the intertube appears to be one of them.

Although there is some juice to squeeze out of the history of this thing, I don't think it's interesting enough. So instead in the rest of this essay I'll just say whatever comes to my mind.

on identity

As you might have gathered, I don't like defining myself. I prefer to let what I do give you enough material so that you can conclude what you want. But in the absence of auto-definition, over time I feel that you are defining me.

Let me be concrete. You are mostly technical people, probably computer programmers. You respect what I write to some degree, and that is very gratifying. But let's say I start writing more about politics and social justice. For me it was much easier to write something political five or ten years ago than it is now. Now, I have to think about my readers, and say something that is both well-reasoned and not offensive, because I have the kind of audience for whom technically correct is the best kind of correct.

I don't know what I think about this. Politically I self-identify as an anarchist. (It's both more and less crazy than it sounds.) Politics are more important than what I do in code, but I'm more hesitant to write about such things, because it's not always the case that I can express it adequately. This may be an advantage to us all, but it is stifling as well.

I told you it was going to be navel-gazing ;-)

on you

Here I am, dropping the second-person plural like it's nothing. Au contraire! I do enjoy writing for an audience. Thank you all for reading: those who started with me at Advogato, those more recent subscribers, and those that just click through occasionally.

Thanks especially to people on Planet GNOME for putting up with me over these last few years. The things I do now are only tangentially related to GNOME, but it has been a great help to my writing to know that there were thousands of people reading. If you ever find it too off-topic, file a bug asking the PGO maintainers to restrict the feed to a particular tag or three.

Thanks also to my commenters. I have learned lots from yall. I especially appreciate the pleasant tone that you have. I apologize for not responding to all of the many good points that you make; sometimes the answer is too long, sometimes I have no excuse. In any case, I put my effort into the articles more than the comment threads. At least that way we mostly avoid back-and-forths; there are better media for that.

For the record, although I don't currently moderate comments, I do delete offensive or spammy comments whenever they appear. This has worked well for me.

on advice

I've done OK with this electrorag. If you write a blog, and aspire to such modest success, I have only this advice: write about interesting things. Write conversationally, and succinctly if possible. Write socially, addressing a community of readers. That's all!

on macro, on micro

This is just as good a time as any to note that I opened a Twitter account recently. It goes completely against my instincts. My blog is self-hosted. I wrote the software that runs the blog. I wrote the compiler that compiles the software that runs the blog, and the compiler is self-hosted!

I feel I have to justify myself here. In the end it is pretty simple: I need fresh chatter, without having an inbox. I feel like my news sources are an echo chamber, and I need to break out of it somehow. Twitter seems appropriate for grabbing links and zeitgeist. RSS seems too much like an inbox. I'll end up trying to self-host and do the identi.ca dance at some point, but for now, Twitter it is. If following is your game, @andywingo is my name.

I still have no idea what to type at that ridiculous 140-character box. When it comes to blogging, I don't know how to do micro :)

onwards

I have no idea what the future holds. Maybe it's a bunch of nerdy compiler articles. Maybe it's cat pictures. We'll see. Thanks for accompanying me this far!

summertime... and the living is easy.

Syndicated 2012-05-30 21:45:45 from wingolog

inline cache applications in scheme

The inline cache is a dynamic language implementation technique that originated in Smalltalk 80 and Self, and made well-known by JavaScript implementations. It is fundamental for getting good JavaScript performance.

a cure for acute dynamic dispatch

A short summary of the way inline caches work is that when you see an operation, like x + y, you don't compile in a procedure call to a generic addition subroutine. Instead, you compile a call to a procedure stub: the inline cache (IC). When the IC is first called, it will generate a new procedure specialized to the particular types that flow through that particular call site. On the next call, if the types are the same, control flows directly to the previously computed implementation. Otherwise the process repeats, potentially resulting in a polymorphic inline cache (one with entries for more than one set of types).

An inline cache is called "inline" because it is specific to a particular call site, not to the operation. Also, adaptive optimization can later inline the stub in place of the call site, if that is considered worthwhile.

Inline caches are a win wherever you have dynamic dispatch: named field access in JavaScript, virtual method dispatch in Java, or generic arithmetic -- and here we get to Scheme.

the skeptical schemer

What is the applicability of inline caches to Scheme? The only places you have dynamic dispatch in Scheme are in arithmetic and in ports.

Let's take arithmetic first. Arithmetic operations in Scheme can operate on number of a wide array of types: fixnums, bignums, single-, double-, or multi-precision floating point numbers, complex numbers, rational numbers, etc. Scheme systems are typically compiled ahead-of-time, so in the absence of type information, you always want to inline the fixnum case and call out [of line] for other cases. (Which line is this? The line of flow control: the path traced by a program counter.) But if you end up doing a lot of floating-point math, this decision can cost you. So inline caches can be useful here.

Similarly, port operations like read-char and write can operate on any kind of port. If you are always writing UTF-8 data to a file port, you might want to be able to inline write for UTF-8 strings and file ports, possibly inlining directly to a syscall. It's probably a very small win in most cases, but a win nonetheless.

These little wins did not convince me that it was worthwhile to use ICs in a Scheme implementation, though. In the context of Guile, they're even less applicable than usual, because Guile is a bytecode-interpreted implementation with a self-hosted compiler. ICs work best when implemented as runtime-generated native code. Although it probably will by the end of the year, Guile doesn't generate native code yet. So I was skeptical.

occam's elf

Somehow, through all of this JavaScript implementation work, I managed to forget the biggest use of inline caches in GNU systems. Can you guess?

The PLT!

You may have heard how this works, but if you haven't, you're in for a treat. When you compile a shared library that has a reference to printf, from the C library, the compiler doesn't know where printf will be at runtime. So even in C, that most static of languages, we have a form of dynamic dispatch: a call to an unknown callee.

When the dynamic linker loads a library at runtime, it could resolve all the dynamic references, but instead of doing that, it does something more clever: it doesn't. Instead, the compiler and linker collude to make the call to printf call a stub -- an inline cache. The first time that stub is called, it will resolve the dynamic reference to printf, and replace the stub with an indirect call to the procedure. In this way we trade off a faster loading time for dynamic libraries at the cost of one indirection per call site, for the inline cache. This stub, this inline cache, is sometimes called the PLT entry. You might have seen it in a debugger or a disassembler or something.

I found this when I was writing an ELF linker for Guile's new virtual machine. More on that at some point in the future. ELF is interesting: I find that if I can't generate good code in the ELF format, I'm generating the wrong kind of code. Its idiosyncrasies remind me of what happens at runtime.

lambda: the ultimate inline cache

So, back to Scheme. Good Scheme implementations are careful to have only one way of calling a procedure. Since the only kind of callable object in the Scheme language is generated by the lambda abstraction, Scheme implementations typically produce uniform code for procedure application: load the procedure, prepare the arguments, and go to the procedure's entry point.

However, if you're already eating the cost of dynamic linking -- perhaps via separately compiled Scheme modules -- you might as well join the operations of "load a dynamically-linked procedure" and "go to the procedure's entry point" into a call to an inline cache, as in C shared libraries. In the cold case, the inline cache resolves the dynamic reference, updates the cache, and proceeds with the call. In the hot case, the cache directly dispatches to the call.

One benefit of this approach is that it now becomes cheap to support other kinds of applicable objects. One can make hash tables applicable, if that makes sense. (Clojure folk seem to think that it does.) Another example would be to more efficiently support dynamic programming idioms, like generic functions. Inline caches in Scheme would allow generic functions to have per-call-site caches instead of per-operation caches, which could be a big win.

It seems to me that this dynamic language implementation technique could allow Guile programmers to write different kinds of programs. The code to generate an inline cache could even itself be controlled by a meta-object protocol, so that the user could precisely control application of her objects. The mind boggles, but pleasantly so!

Thanks to Erik Corry for provoking this thought, via a conversation at JSConf EU last year. All blame to me, of course.

as PLT_HULK would say

NOW THAT'S AN APPLICATION OF AN INLINE CACHE! HA! HA HA!

Syndicated 2012-05-29 08:07:39 from wingolog

list of ian lance taylor's linker articles

I was working on an ELF linker in Scheme recently, and wanted to re-read some of Ian Lance Taylor's 20-part series on ELF linkers and linking. In a brief search of the tubes, a list of the articles didn't come up, so perhaps this short and sweet list will make some future web searcher happy.

The list: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

They articles are well-written and fascinating, so don't click unless you have a few hours to burn.

Syndicated 2012-05-23 19:25:35 from wingolog

391 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!