Older blog entries for wingo (starting at number 322)

hacker culture, permaculture

callings

Here's an idea: of everyone out on the ether brushed by these bytes, there has to be a good number of us that are in "it" not just for the game, but for life: in the sense that our actions can be life-affirming, that our interactions can help bring about a more beautiful world.

So, with that realization in mind, I call "book club". Let's read a book together!

What book, you ask? Here's mine for now: Permaculture: Principles and Pathways Beyond Sustainability. It's by David Holmgren, one of the originators of the permaculture design system.

I think that Holmgren has as much to say about how we live life as Alexander. He's also a builder, in a way. It really seems to me that Holmgren's work fosters Alexander's quality-without-a-name. All that is by way of introduction, to say that people that like Alexander, of whom there are many in the hacker world, might well enjoy Holmgren.

But why this book now? The answer is that Lyn Gerry, the host of the radio show, Unwelcome Guests, is reading it to us: an hour every week. It's nice to hear a book. It's also nice to hear it like this, over time, so every part has a chance to seep in.

So. For the first installment I'll give a direct link to the MP3: here. It starts with a poem and a tune, then the reading. I'm not sure where the reading starts in second installment, from this week's episode, because I just downloaded it. Usually it's at the start of the second hour -- the hours are separated in the direct downloads -- but I always listen to the whole thing anyway, so I download them both. There's a podcast link too on the archives page.

I would like for our conversations about the book to be open, in the sense that radio is open, for people to tune in and listen to if they want. By that I mean to say let's not have a mailing list -- what do people think about seeing if we can have a cross-blog-and-comments, cross-identica/twitter discussion? That could fail of course, but it sounds like a nice thing to try.

Anyway, let me know if you want to join. I know that if you're interested, that will make at least two of us.

cold, cold part of the world

I'm off to Sweden tomorrow for the 2009 GNU Hackers Meeting, co-located with FSCONS, the Free Society Conference and Nordic Summit. I'll talk at the GHM about recent developments in Guile, and Guile's place within the GNU universe. If you're going to be at FSCONS, let's meet up!

Syndicated 2009-11-10 22:53:15 from wingolog

class redefinition in guile

Yes hello hello!

Long-time readers will perhaps recall this diagram:


figure zero: things as they were.

It comes from an article describing how Guile represents its objects in memory, with particular concern for Guile-GNOME.

I was hacking on this code recently, and realized that this representation was not as good as it could be. Our switch to the BDW garbage collector lets us be more flexible with the size of our allocations, and so we can actually allocate the "slots" of an object inline with the object itself:


figure one: things how maybe they could have been.

Alas, during the hack, I discovered a stumbling block: that this representation doesn't allow for classes to be redefined.

redefine a data type, what?

Yes, Guile's object-oriented programming system (GOOPS) allows you to redefine the types of your data. It's OK! CLOS lets you do this too; it's an old tradition. Redefining a class at runtime allows you to develop by incremental changes, without restarting your program.

Of course once you change a class, its instances probably need to change too, probably reallocating their slots. So we have to reintroduce the indirection -- but allowing for locality in the normal, non-redefined case. Like this:


figure two: things how they are, almost.

So updating an instance is as simple as swapping a pointer!

Almost.

Well, not really. This is really something that's unique to Lisp, as far as I can tell, and not very widely-known in the programming community, and hey, I didn't completely understand it -- so man, do I have a topic for a blog or what.

step one: make a hole in the box

The way redefinition works is that first you make a new class, then you magically swap the new for the old, then instances lazily update -- as they are accessed, they check that their class is still valid, and if not update themselves. It's involved, yo, so I made a bunch of pictures.


figure three: class redefinition begins with defining a new class.

So yeah, figure three shows the new class, lying in wait beside the old one. Then comes the magic:


figure four: same identity, different state.

What just happened here? Well we just swapped the vtable and data pointers in the old and new classes. For all practical purposes, the old class is the new class, and vice versa. All purposes except one, that is: eq?. The old class maintains its identity, so that any code that references it, in a hash table for example, will see the same object, but with new state.

The class' identity is the same, but its state has changed. That's the key thing to notice here.

Now we mark the old class's data as being out of date, and the next time its instances check their class... what? Here we reach another stumbling block. The old class has already has new state, so it is already fresh -- meaning that the instance will think nothing is wrong. It could be that the instance was allocated when its class declared two slots, but now the class says that instances have three slots. Badness, this; badness.

So what really needs to happen is for instances to point not to the identity of their classes, but to the state of their classes. In practice this means pointing directly to their slots. This is actually an efficiency win, as it removes an indirection for most use cases. Comme ça:


figure five: instances actually point to class state, not class identity.

As we see in the figure, a well-known slot in the class holds the redefinition information -- normally unset, but if the class is invalidated, it will allow the instance to know exactly which version of the class it is changing from and to.


figure six: new equilibrium.

And finally, figure six shows the new state of affairs -- in which slot access has been redirected for all redefined classes, and all of their instances, transitively.

efficiency

All in all, this is quite OK efficiency-wise. Instance data is normally local, and class data is one indirection away. A redefined instance will have nonlocal data, but hey, not much you can do otherwise, without a copying collector.

There is one efficiency hack worth mentioning. Accessors, discussed in an earlier article, don't need to check and see if their class is up to date or not. This is because they are removed from the old class and re-added to the new one as part of the redefinition machinery.

summary

Redefinition is complicated, but pretty neat.

really, that's the summary?

Yes.

Syndicated 2009-11-09 14:16:27 from wingolog

optionals, keywords, oh my!

OK! Where were we?

In my last dispatch, I talked about case-lambda in guile. The gist of it is, let procedures parse their own arguments, and they can do neat stuff like multiple-arity dispatch.

Also, neat stuff like optional and keyword arguments! Consider our my-write example from last time:

(define my-write
  (case-lambda
    ((obj port) (write obj port))
    ((obj)      (my-write obj (current-output-port)))))

It's a little silly, to write it this way. It's not essentially one procedure with two different bodies, it's one procedure with one required argument, and one optional argument. The optional argument defaults to (current-output-port).

So, as you would imagine, there is a better way to express this "design pattern": lambda*, and its sugary friend, define*.

In this case, we would simply define my-write like so:

(define* (my-write obj #:optional (port (current-output-port)))
  (write obj port))

So nice, so clear. Default values are only evaluated if the argument is missing. (It's a rare Python programmer that's not surprised about Python's behavior in this regard; but I digress.)

keyword args too

Optional arguments are good at allowing for concision and extensibility, but code that uses them can be confusing to read. Actually this is a problem with positionally-bound arguments in general.

I like how Carl Worth puts it: that nice prototypes can result in inscrutable code. His solution in C is to have function names encode their arities, but we can do better in Scheme, with keyword arguments.

So let's say we want to add a "detailed" argument to my-write. We can add a keyword argument:

(define* (my-write obj
                   #:optional (port (current-output-port))
                   #:key (detailed? #f))
  (if detailed?
      (format port "Object ~s of type ~s" obj (class-of obj))
      (write obj port)))

Invocations are really nice to read:

(my-write 'foo #:detailed? #t)
=| Object foo of type #<<class> <symbol> 8c4fca8>

(my-write 'foo (open-output-file "foo.log") #:detailed? #t)
; writes the same thing to foo.log

The second example gives an explicit port; and indeed, I am left wondering what it is, when I read it. Keyword arguments make for more readable code.

Keyword arguments also allow for better extensibility. But don't take it from me, take it from P. Griddy:

Most of the operators in Rtml were designed to take keyword parameters, and what a help that turned out to be. If I wanted to add another dimension to the behavior of one of the operators, I could just add a new keyword parameter, and everyone’s existing templates would continue to work. A few of the Rtml operators didn’t take keyword parameters, because I didn’t think I’d ever need to change them, and almost every one I ended up kicking myself about later. If I could go back and start over from scratch, one of the things I’d change would be that I’d make every Rtml operator take keyword parameters.

-- Paul Graham, from a talk he gave back when he didn't talk about startups so durn much

And, there's one more thing, which applies both to optional and keyword arguments: the default values are evaluated in the lexical context of their preceding arguments. So you can have a later argument referring to an earlier one. For example, Guile's compile is defined like this:

(define* (compile x #:key
                  (from (current-language))
                  (to 'value)
                  (env (default-environment from))
                  (opts '()))
  ;; wizardly things here
  ...)

See how env's default value references from? Awesome, yes? I thought so.

newness

So what's new about all this? Not much, semantically. Guile has supported lambda* and define* for more than 10 years. But now they are available in the default environment, and they are fast fast fast -- for the same reasons that case-lambda is faster now. There are special opcodes to process stack arguments into optionals, and to shuffle and bind keyword arguments, all without consing a single cell.

Also, now the toolchain knows about optional and keyword arguments, so that backtraces and printouts show them nicely. For example, my-write prints like this:

#<program my-write (obj #:optional port #:key detailed?)>

Ah, there is case-lambda*; though it is of dubious utility, given that it can only reasonably dispatch on the required and optional arity, and not on keyword args. But there it is.

In any case, I look forward to using lambda* more in the future, without speed trepidations. Just say no to rest arguments masquerading as optionals!

Syndicated 2009-11-08 12:33:29 from wingolog

case-lambda in guile

Oh man, does the hack proceed apace. I really haven't had time to write about it all, but stories don't tell themselves, so it's back behind the megaphone for me.

Guile is doing well, with the monthly release train still on the roll. Check the latest news entries for the particulars of the past; but here I'd like to write about a couple aspects of the present.

First, case-lambda. The dilly here is that sometimes you want a procedure that can take N or M arguments. For example, Scheme's write can be invoked as:

(write "Hi." (current-output-port))
=| "Hi."

(=| means "prints", in the same way that => means "yields".)

But actually you can omit the second argument, because it defaults to the current output port anyway, and just do:

(write "Hi.")
=| "Hi."

Well hello. So the question: how can one procedure take two different numbers of arguments -- how can it have two different arities?

The standard answer in Scheme is the "rest argument", as in "this procedure has two arguments, and put the rest in the third." The syntax for it is not very elegant, because it introduces improper lists into the code:

(define (foo a b . c)
  (format #t "~a ~a ~a\n" a b c))
(foo 1 2 3 4)
=| 1 2 (3 4)

You see that 1 and 2 are apart, but that 3 and 4 have been consed into a list. Rest args are great when your procedure really does take any number of arguments, but if the true situation is that your procedure simply takes 1 or 2 arguments, you end up with code like this:

(define my-write
  (lambda (obj . rest)
    (let ((port (if (pair? rest)
                    (car rest)
                    (current-output-port))))
      (write obj port))))

It's ugly, and it's not expressive. What's more, there's a bug in the code above -- that you can give it 3 arguments and it does not complain. And even more than that, it actually has to allocate memory to store the rest argument, on every function call. (Whole-program analysis can recover this, but that is an entirely different kettle of fish.)

The solution to this is case-lambda, which allows you to have one procedure with many different arities.

(define my-write
  (case-lambda
    ((obj port) (write obj port))
    ((obj)      (my-write obj (current-output-port)))))

implementation

You can implement case-lambda in terms of rest arguments, with macros. Guile did so for many years. But you don't get the efficiency benefits that way, and all of your tools still assume functions only have one arity.

Probably the first time you make a VM, you encode the arity of a procedure into the procedure itself, in some kind of header. Then the opcodes that do calls or tail-calls or what-have-you check the procedure header against the number of arguments, to make sure that everything is right before transferring control to the new procedure.

Well with case-lambda that's not a good idea. Actually if you think a bit, there are all kinds of things that procedures might want to do with their arguments -- optional and keyword arguments, for example. (I'll discuss those shortly.) Or when you are implementing Elisp, and you have a rest argument, you should make a nil-terminated list instead of a null-terminated list. Et cetera. Many variations, and yet the base case should be fast.

The answer is to make calling a procedure very simple -- just a jump to the new location. Then let the procedure that's being called handle its arguments. If it's a simple procedure, then it's a simple check, or if it's a case-lambda, then you have some dispatch. Indeed in Guile's VM now there are opcodes to branch based on the number of arguments.

So much for the VM; what about the compiler and the toolchain? For the compiler it's got its ups and downs. Instead of a <lambda> that just has its arguments and body, it now has no arguments, and a <lambda-case> as its body. Each lambda-case has an "alternate", the next one in the series. More complicated.

Then you have the debugging information about the arities. The deal here is that there are parts of a procedure that have arities, probably contiguous parts, and there are parts that have no arity at all. For example, program counter 0 in most procedures has no arity -- no bindings have been made from the arguments to local variables -- because the number of arguments hasn't been checked yet. And if that check fails, you'll want to show those arguments on your stacktrace. Complication there too.

And the introspection procedures, like procedure-arguments and such, all need to be updated. On the plus side, and this is a big plus, now there is much more debugging information available. Argument names for the different case-lambda clauses, and whether they are required or rest arguments -- and also optional and keyword arguments. This is nice. So for example my-write prints like this:

#<program my-write (obj port) | (obj)>

So yeah, Guile does efficient multiple-arity dispatch now, and has the toolchain to back it up.

Next up, efficient optional and keyword arguments. Tata for now!

Syndicated 2009-11-07 11:52:25 from wingolog

digital interfaces

I made some tomato and red pepper soup for lunch yesterday. Before I had a chance to eat it, the universe decided it still needed more red, and that I should try something stupid with a pocketknife. I sliced up my left forefinger and thumb pretty good.

This blog seems to be specializing in thoughts just before blood, so here it is: ah fuck, going to have to get stitches. I knew that in the first second.

Thankfully, there's a CAP (Centre d'Atenció Primària) in most neighborhoods, so after laying down on the couch to make sure I wouldn't faint, and grabbing chocolate from the cupboard, considering I hadn't yet eaten the soup, I walked the 10 minutes to the CAP, my hastily bandaged hand held high. People looked at me funny.

An hour and a half later, well, four stitches in the index finger, three on the thumb. But I'm ok.

* * *

I hear that some family of mine is going to these "tea party" protests. If you're not plugged into the States political scene, the deal is this: the Republican brand is broke, and everyone knows it. But there is so much anger at their base. So voila Republican anger without the Republican state trappings, a catchment of the neofascist tendencies in all of us, whipped up around a symbol: the idea that Obama is a foreign element, an outsider, not of us, coming to enslave us all.

One of my family writes, referring to the return of another from these "tea party" protests:

When you are released and your tracking anklet has been removed. . . what do you say we move to Montana and prepare for the movie Red Dawn? We don't have to paint Wolverine on the side of every Afghan or Obamian tank that we destroy, but we can live off the land, sleep under the stars and pee in overheated radiators. The enemy will be obvious out there. I remember Sesame Street. . . . which one does not look like the others. . . . got it. Always look for the red dot and the table cloth on the head.

This makes me so sad. And the thing that's really binding them together, even the less racist, is hatred of "Obamacare" -- the idea that one should be able to walk into a clinic, get treated well and kindly, and walk out, regardless of your employment status, without signing for anything, without paying anything, as I did yesterday, in this foreign land.

Syndicated 2009-11-07 10:08:48 from wingolog

rememberings

Incredible. I got a copy of "The Boy With the Arab Strap" this evening, and I smell things from 10+ years ago -- early fall flowers and grass, thousands of miles away, even as it rained and rained here today, all grey day long.

I used to be "that guy". You know, the guy you copied tapes from.

In Namibia, when I was in the Peace Corps, we traded tapes a lot. In the fall of 2003 a new group arrived, and one of the girls in it was really cute. We might have been a thing. I engaged in mating rituals -- that is, "I'll send you a mix tape". And then she laughed!

Apparently somewhere in 2003 things changed. Indeed when I got back in winter 2004, it was all about the Ipod contraption, mp3s, and some mix CDs.

But everyone who knew it, knows it: a mix CD does not a mix tape make.

Syndicated 2009-10-22 20:38:00 from wingolog

read what i wrote on my shirt

The light of summer has died, but not the heat. The sky failed to rain today. I'm sure that if cloud seeding actually worked, Pfizer would arrange to dispense rain by means of little purple pills.

I finally got around to uploading photos recently, for the first time since May. Photos are the new inbox.

It's a good thing a few of these turned out well, because I burned my glasses taking them -- on the inside of the lens. No wonder the EU wants to ban fireworks in the street, but what a pity. Up next: lifejacket beach sunbathing.

states that start with p

Two places I offer to you, both in the new-to-me state of Pennsylvania.

One: Eastern State Penitentiary, a "model prison", back from the days of utopia. From the outside, Eastern State imposes, with high, silent walls -- as if it had its back to the city, all around. And inside... well, it was built under the idea that 23 hours of silent isolation + 1 hour of solitary exercise would be good for the soul, somehow. So there are two sides of the experience: one space in which to see everything, linear wards radiating from the one-point center, and the other in-cell space, in which you see no one. Ever.

And two, Wharton Esherick's house/museum/autobiography. Beautiful and inventive woodwork, this.

pollack and the cia

I went to New York's Museum of Modern Art for the first time, which was lovely -- to see so many icons of the last century, there, just on the wall.

Walking through the Rockefeller rooms of 1950s American art, I couldn't help but think of Who Paid the Piper?, or, as it was titled in the US, The Cultural Cold War. The deal, as I recall the book's argument, is that in the aftermath of World War II, Europe was really up for grabs. Of course we know how it played out politically, but art was also at stake -- that for the elites of Europe to be on board with the US politically, they had to believe.

So, the US did have whiz-bang technology, but man cannot live on atomic energy alone, as they say. Humanity needs dreams and interpretation: a function of art. The CIA wanted the dreamers and interpreters to (a) not be allied to communists, at all; and (b) to promote an "American" art, something in keeping with modernity and transgression. So Jackson Pollock was funded to be a cowboy-artist, on purpose.

Yes, I am struck, standing in front of his works; they have that something. I like it. But it is complicit.

Apparently, enough time has passed that even the CIA itself can talk about it -- https-only link because, er, apparently art is that important.

Syndicated 2009-10-08 20:55:05 from wingolog

thigh fives

The intertron seems to be misled as to the nature of thigh fives.

As a citizen it is my duty to assist in correcting this oversight.

Syndicated 2009-10-06 09:27:37 from wingolog

wikipedia & guile

Good evening, internet!

Oftener then not, when posting dispatches to this ship's log, it's with an air of surprise: of the turned head over the shoulder, smiling at the unexpected appearance of a friend. Well hello world, then, it's just been like that, punctuated ensimismamiento.

* * *

This particular evening finds me in Los Angeles, here for work. I have loads of stories and photos, hopefully soon forthcoming, but the present impetus for text is more mundane. And nerdy, may I mention. For the laity, imagine a pendulous pocketwatch, you are getting sleepy, yes, yes -- you need not read the rest -- you might remember suddenly that you wanted to make some tea, and that upon your return to the machine will remember having read all of this log entry, and that its writer was a genteel and amiable fellow.

You should probably stop reading now, your tea water is aboil!

* * *

For the rest of you, please forgive the verbal excess, I've been reading too much Jane Austen recently.

My friend and colleague B. Tregre pointed me to Guile's wikipedia page today, basically asking, "is it true?"?

Because if you read that page, it's pretty clear that Guile is a bad thing. I mean, in deference to Norvig, let's condense that page:

  1. Guile

    1. Implementation of Scheme

    2. Since 1993

    3. Embeddable as a C library

    4. But let's get to the dirt

  2. Scheme compliance

    1. Still bitter about Guile following the 1998 Scheme standard but not the 1991 one

    2. For some reason we're going to complain about Guile's call/cc implementation

    3. Also we're going to demonize conservative collectors some more.

  3. Salt, wounds

    1. Etc.

meta

Perhaps actually I should start with a meta-meta: I hack on Guile for a number of reasons. I like it, first of all. It's good software, widely deployed, with some neat stuff in it. I like the folks that hack on it -- the people and what they do. I like the project's stated goals of freedom, and I like the hack. &c.

But what of that neat stuff shows up in the Wikipedia article? Practically nothing. Lack of enthusiasm in an encyclopedia is to be expected; but lack of information is something else.

The Guile wikipedia entry simply doesn't tell you much about Guile itself. So that's my first meta-complaint.

My second meta-complaint is perhaps more to the point -- that the real message, the message if you read between the lines, is "Guile is a bad implementation of Scheme". This is editorialization through tone and fact selection. That most of the facts are strictly true does not mean that the message is true.

So to me, while the facts of the article are mostly (though not entirely) right, the article itself is wrong. The selection of facts is not representative of Guile, the object under consideration.

non-meta

I guess it's worthwhile to discuss the principal argument against Guile expounded more explicitly in the article: namely, conservative garbage collection.

Conservative collection is a way to scan all memory, looking for pointers. If it finds something that looks like a pointer, it considers the pointed-to address to be in use. At the end of "marking" the memory, the collector "sweeps" all non-marked locations.

If you're confused but interested, I discussed this topic last year in some detail. Wikipedia itself has a good discussion, too.

Guile now uses the Boehm-Demers-Weiser collector, instead of having an in-Guile implementation; Boehm himself discusses the topic as well.

Basically, there is a potential problem with conservative GC: one might see an integer somewhere in memory, but mistakenly interpret it as a pointer, thus keeping around data unnecessarily. If this happened often enough, or on a big enough data structure, you could leak lots of memory.

In practice, this rarely happens.

In the very very unlikely event that you keep memory around that should be freed, there are tools to work around the condition; but in a language implementation, where you really control what's in addressable memory (both heap and stack), you might be able to eliminate such misidentifications altogether. Finally in a 64-bit system such a collision is highly unlikely. I've never had a problem with Guile leaking memory.

But, there are some programmers who see this hack (and it is a hack) as a Pavlovian bell for vitriol. I really don't get it. It's not like you have any options in plain C. And if your language implementation exposes itself broadly to C, your implementation doesn't have many options either.

PLT Scheme was able to switch away from conservative collection because they were able to lessen their exposure to C, via development of a good in-Scheme foreign function interface (FFI). Guile will have this possibility in the future.

culpability

I've thought about these issues fairly hard. One of my goals is to get Guile into Emacs within the next 12-18 months. But what if pointers were consistently and persistently misidentified, making it impossible to keep Emacs sessions open for months or years? Then the snarky Wikipedians would add a paragraph explaining how it was Guile that broke Emacs.

So the question to consider is, what bit pattern is identified by libGC as a pointer, and what non-pointer words on your system have a value that matches that pattern? Pointers to heap-allocated objects are aligned on 8-byte boundaries, so their lower three bits will be 0. In addition, the pointer value must lie within an allocated heap segment, which typically are large values, at least a million or so. So integers that you typically see are most unlikely to collide.

Furthermore, while it is typically said that "certain integer values" can be mistakenly identified, those are integer values in C. Guile's representation of integers (at least, those less than 2^30 or 2^62) has the lower two bits of the integer to be "10" -- so a Guile integer will never be confused with a pointer.

What can be confused are native (C) integers. Native C integers can appear ephemerally on the stack, or in packed memory blocks, i.e. in a packed uint32 array. In the latter case, libGC would allocate that block as a "pointerless" segment, so the integers would not be scanned, leaving us only with the stack case.

For the stack, you have two cases -- code introduced by your runtime, e.g. the VM code, and code introduced by Scheme. I've already argued that Scheme values cannot be misidentified. The VM code is so small it is auditable -- and extremely unlikely to persistently store a large integer on the stack, because such integers are not used in the VM.

Even these stack unknowns will be eliminated when Guile does native compilation (within 12-24 months), so we completely control all bits on the stack, too.

That leaves the C runtime, which if there are problems in Emacs, they can be worked around with the libGC tools. But in the end, removing most GC annotations from Guile's runtime (and eventually, remove the GCPRO mess from Emacs) has hugely positive maintenance benefits -- and, I am convinced, no downsides in practice.

stories

I could conclude that "you shouldn't let others write your wikipedia pages" -- given a sufficiently broad conception of self-as-project -- but that makes Wikipedia look the victim here.

The ultimate meta-conclusion is more about message and transmission -- in short, about story. That is, to not let others define who or what you are. Make sure they are telling your story.

And if they aren't telling your story, to motivate your own story-tellers and retellers to get the word out. I'm hoping to get others to fix that article over the next few months, by getting guile-users subscribers to garden it a bit.

Until next time, intertube. Happy October!

Syndicated 2009-10-01 15:49:33 from wingolog

goto 1965

travels through space

Time passes, and space with it. I don't like airplanes overmuch, so I caught a train up to Paris last weekend, and from there Kate & I took a plane to Dublin to visit Jan Jaime Jingle Hemmett Schmidt.

Dublin was lovely, lovely -- performing for us in weather and in song, in light and dark -- the dark being the fine stout, of course. I couldn't pick a favorite part, though the one that keeps popping into mind is Newgrange, a 5000-year-old neolithic passage tomb, forty-some meters in diameter, aligned so that a shaft of light would pass into the chamber on the winter solstice.

That image aligns nicely with my Mumford readings of late. Before the construction of such artifacts, either as sticks planted in the ground or as massive monuments, time did not exist -- at least, not as we know it now, as an outside thing ticking on discretely without us -- as opposed to the spurting yet continuous flow experienced within.

I spent the rest of last week hanging out in deserted August Paris, the city-time of open-air cinema and ambles in unpeopled streets. But I won't lie: I spent a fair piece of it indoors, hack on the mind.

travels through time

At my last dispatch, I was in 1987, implementing flat closures, described in Dybvig's dissertation.

The recent spatial travel to Dublin, unaccompanied by the laptop, left my mind unusually clear. So spatially back in Paris on Wednesday I left again for 2002, implementing Fixing Letrec. The important contribution of that paper was a systematic, direct way to translate mutually recursive lambda expressions to a generalized form of the fixed-point operator, also known as the Y combinator. Quoth the prophets Waddell, Sarkar & Dybvig:

The transformation converts letrec expressions into an equivalent mix of let, set!, and fix expressions. A fix expression is a variant of letrec that binds only unassigned variables to lambda expressions. It represents a subset of letrec expressions that can be handled easily by later passes of a compiler.

In particular, no assignments through external variables are necessary to implement mutually recursive procedures bound by fix. Instead, the closures produced by a fix expression can be block allocated and “wired” directly together. This leaves the fix-bound variables unassigned, thus simplifying optimizations such as inlining and loop recognition.

fix is identical to the labels construct handled by Steele’s Rabbit compiler and the Y operator of Kranz’s Orbit compiler and Rozas’ Liar compiler.

For me, this passage was tantalizing, but I didn't quite understand the optimizations that a transformation to fix allowed. Yes, I had read Steele's dissertation a number of times, but Guile's compiler doesn't do a CPS rewrite, so it was unclear how some of the optimizations applied.

But my recent "flat closure" work had pointed out one optimization, the "block allocation" mentioned in the above paragraph. Consider two mutually recursive functions:

  (define (analyze n)
    (define (even? n)
      (or (= n 0)
          (odd? (- n 1))))
    (define (odd? n)
      (or (= n 1)
          (even? (- n 1))))

    (cond ((< n 0)   'negative-number)
          ((even? n) 'even-number)
          (else 'odd-number)))

Ignore for the moment the dubious "analysis" that this function performs. (Can you spot the bug?) What I want to focus on are the internal definitions, even? and odd?. Using define in a nested context like this is simply sugar around letrec, so this is equivalent to:

  (define (analyze n)
    (letrec ((even? (lambda (n)
                      (or (= n 0)
                          (odd? (- n 1)))))
             (odd?  (lambda (n)
                      (or (= n 1)
                          (even? (- n 1))))))
      (cond ((< n 0)   'negative-number)
            ((even? n) 'even-number)
            ((odd? n)  'odd-number)))

Here we see the mutual recursion expressed more clearly. Looking more closely at the even? and odd? bindings, we see that each has one free variable:

  (lambda (n)
    (or (= n 0)
        (odd? (- n 1))))
          ^ odd? is free in even?

  (lambda (n)
    (or (= n 1)
        (even? (- n 1))))
          ^ even? is free in odd?

If even? and odd? are compiled as mutually recursive closures (a point to which I will return shortly), they need to capture each other's bindings -- kindof a chicken-and-egg problem, no?

In the lambda calculus, this problem is solved via the Y combinator, which basically involves passing the functions of interest to another function, which in turn invokes the functions of interest, passing themselves as their arguments. It's pretty neat. Thomas Holubar and Jos Koot have been discussing this very topic at FLIBUG, incidentally (see Thomas's slides on the need for Y for a brief introduction).

origins of letrec

To continue the digression, I understand that the letrec construct was originally defined by Peter Landin in a 1965 paper, A Correspondence between ALGOL 60 and Church's Lambda-Notation. He uses a very schemely intermediate language to model ALGOL 60's semantics with sugar on top of a slightly extended lambda calculus.

In the following discussion, from the paper, Landin describes how mutually recursive definitions and labels (goto targets) can be expressed using letrec. Take φn to mean an arbitrary expression; italics are mine.

Definitions can also arise from the block-body -- their definees being the labels that are local to the block. These are defined in terms of locals, including each other, and they may be referred to by procedure and switch declarations. Hence labels must be grouped with switches and procedures as a single simultaneously recursive definition. The overall treatment of a block is therefore as follows.

[On the left we have ALGOL 60, on the right Landin's "Applicative Expression" intermediate language.]

begin                     let a=φ1
    real a;                   and A=φ2
    array A φ2;             let rec P=φ3
    procedure P φ3;                 and S=φ4
    switch S φ4;                    and L=φ6
    φ5;                             and M=φ7
 L: φ6;                       φ5            
 M: φ7;
end

[Here is the translation into the lambda calculus. Note the Y!]

{λ(a,A).{λ(P,S,L,M).φ5}
        [Yλ(P,S,L,M).(φ3,φ4,φ6,φ7)]}
[φ1,φ2]

Applying Y to mutually recursive procedures/labels binds them together, allowing them to see themselves. So you see, Y was in letrec from the very beginning.

Note that labels, denoting basic blocks, are modelled as procedures of 0 arguments. In the light of Steele's 1977 contribution, which I will discuss shortly, Landin's 1965 paper was particularly prescient. Landin says:

The treatment of jumps springs from the observation that the symbol 'go to' in ALGOL 60 is redundant, and could be ignored by a preprocessor. That is to say, there is a considerable similarity between labels and the identifiers of parameterless nontype procedures. It is possible to use the same "calling mechanism" for both, leaving any difference to be made by the thing that is "called"...

It might therefore be supposed that labels can be eliminated formally by considering each labelled segment of program as a parameterless procedure declaration (and hence as a definition whose defiens is a λ()-expression).

Steele turns this isomorphism around, saying instead:

In general, procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly encoded as JUMP instructions. This is a simple, universal technique... Our approach results in all tail-recursive procedure calls being compiled as iterative code, almost without trying, for it is more a matter of the code generation strategy than of a specific attempt to remove recursions.

Landin passed away in June of this year.

mischief managed

Anyway, back to 2002. Scheme's letrec construct is more general than Landin's let rec; it allows arbitrary expressions on the right-hand side, not just procedures. So the first trick Waddell shows is how to transform letrec into "an equivalent mix of let, set!, and fix". Once he has the fix, which is the same as Y, we need to compile it -- and that's where we were before longjmping back to 1965.

In the lambda calculus, a very minimal language, often there are more efficient ways to implement higher-order constructs, such as fix.

So fix-bound procedures need to capture an environment including themselves, fine. The standard way (in Scheme now, not the lambda calculus) is to bind e.g. even? and odd? (as in our example above) to empty "boxes", then set! the contents of the boxes to the procedures. The fix-bound lambda-expressions have bound the right variables, and after the boxes have been filled in via set!, those expressions hold each other's value: mischief managed. Like this:

(let ((even? #f) (odd? #f))
  (set! even? (lambda (n) (or (= n 0) (odd? (- n 1)))))
  (set! odd? (lambda (n) (or (= n 1) (even? (- n 1)))))
  ...)

But this is not ideal, because it forces the fix-bound lambda expressions to be allocated on the heap, in boxes, whereas in fact they are never set! in the source text.

We can do better. Since the lambda expressions don't actually run until after the bindings have been made, and thus don't reference their free bindings until then, we can allocate them on the stack, then fix up their free variables, mutating the closures in place.

Thus we don't introduce extraneous set! constructs into the code. This simplifies inlining and loop detection, as Waddell notes, and also allows the fix-bound variables to be allocated on the stack instead of in boxes, removing an indirection at runtime.

even better

But we can do even better than that. In this example, we still allocate a closure for even? and for odd? -- but we can avoid that too. Notice that even? and odd? are always called in tail position with respect to the letrec construct. Landin tells us that labels, the targets of go to, may be viewed as procedures of 0 arguments, with respect to the lambda calculus; Steele goes the other way, in his 1977 paper, Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO. (If you program, and want just one article to read out of all of these, read this paper.)

So, for a tail-call to a fix-bound procedure, we can simply render the bindings of the procedures inline to the parent procedure, jump over them, then enter the body of the fix. Any tail-call to a fix-bound procedure compiles a goto -- after setting its arguments. Since the procedure is lexically bound, and never set!, we know exactly where those arguments are going to be, so we set them directly -- no need to shuffle them around.

So an empty loop:

(letrec ((loop (lambda () (loop))))
  (loop))

Looks like this:

   ;; jump over definition of `loop'
   0    (br :L124)                      ;; -> 16
   ;; definition of `loop'
   8    (br :L125)                      ;; -> 8
   ;; letrec body
  16    (br :L125)                      ;; -> 8

(You get this by typing ,x (lambda () (let loop () (loop))) into the Guile REPL.)

A loop counting down from 100 looks like this:

scheme@(guile-user)> ,x (lambda ()
                          (let lp ((n 100))
                            (or (zero? n)
                                (lp (1- n)))))
Disassembly of #<program b742e6d0 at <unknown port>:51:3 ()>:

   0    (br :L141)                      ;; -> 32
   8    (local-ref 0)                   ;; `n'
  10    (make-int8:0)                   ;; 0
  11    (ee?)                           
  12    (local-set 1)                   ;; `t'
  14    (local-ref 1)                   ;; `t'
  16    (br-if-not :L142)               ;; -> 24
  19    (local-ref 1)                   ;; `t'
  21    (return)                        
  24    (local-ref 0)                   ;; `n'
  26    (sub1)
  27    (local-set 0)                   ;; `n'
  29    (br :L143)                      ;; -> 8
  32    (make-int8 100)                 ;; 100
  34    (local-set 0)                   
  36    (br :L143)                      ;; -> 8

I mean, that's pretty damn good. We could do better if we reordered the blocks a bit, and compiled the conditional branch br-if-not into something more specific, but still -- pretty tight for compiling scheme to bytecode.

(The `t', if you are curious, is a result of the expansion of or.)

conclusions

Guile has applied Waddell's "Fixing Letrec" strategy to transform Scheme's general letrec into more primitive constructs, including fix. fix-bound lambda expressions that need to be allocated as closures are now faster and cons less, given that they aren't allocated in boxes, and an important subset of fix expressions is now rendered as inline code and wired together using goto, a pleasant illustration that indeed, lambda is the ultimate goto.

I didn't think I would get to this kind of bytecode this fast, but once the simplification to fix was made, the rest of the optimizations were obvious. But the hack continues, there's always more to do. Expect to see this out in Guile's 1.9.2 release that's coming on Saturday. Happy hacking!

Syndicated 2009-08-10 07:45:43 from wingolog

313 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!