2 Nov 2013 wingo   » (Master)

scheme quiz time!

Scheme quiz time!

Consider the following two functions:

(define (test1 get)
  (let ((v (make-vector 2 #f)))
    (vector-set! v 0 (get 0))
    (vector-set! v 1 (get 1))
    v))

(define (test2 get)
  (let* ((a (get 0))
         (b (get 1)))
    (vector a b)))

Assume the usual definitions for all of the free variables like make-vector and so on. These functions both create a vector with two elements. The first element is the result of a call to (get 0), where get is a function the user passes in as an argument. Likewise the second comes from (get 1).

(test1 (lambda (n) n)) => #(0 1)
(test2 (lambda (n) n)) => #(0 1)

So the functions are the same.

Or are they?

Your challenge: write a standard Scheme function discriminate that, when passed either test1 or test2 as an argument, can figure out which one it is.

. . .

Ready? If you know Scheme, you should think on this a little bit before looking at the answer. I'll wait.

. . .

OK!

We know that in both functions, two calls are made to the get function, in the same order, and so really there should be no difference whatsoever.

However there is a difference in the continuations of the get calls. In test1, the continuation includes the identity of the result vector -- because the vector was allocated before the get calls. On the other hand test2 only allocates the result after the calls to get. So the trick is just to muck around with continuations so that you return twice from a call to the test function, and see if both returns are the same or not.

(define (discriminate f)
  (let ((get-zero-cont #t)
        (first-result #f))
    (define (get n)
      (when (zero? n)
        (call/cc (lambda (k)
                   (set! get-zero-cont k))))
      n)
    (let ((result (f get)))
      (cond
       (first-result
        (eq? result first-result))
       (else
        (set! first-result result)
        (get-zero-cont))))))

In the call to f, we capture the continuation of the entry to the (get 0) call. Then later we re-instate that continuation, making the call to f return for a second time. Then we see if both return values are the same object.

(discriminate test1) => #t
(discriminate test2) => #f

If they are the same object, then the continuation captured the identity of the result vector -- and if not, the result was only allocated after the get calls.

so what?

Unhappily, this has practical ramifications. In many compilers it would be advantagous to replace calls to vector with calls to make-vector plus a series of vector-set! operations. Such a transformation lowers the live variable pressure. If you have a macro that generates a bison-like parser whose parse table is built by a call to vector with 400 arguments -- this happens -- you'd rather not have 400 live variables in the function that builds that table. But this isn't a safe transformation to make, unless you can prove that no argument captures the current continuation. Happily, for the parser generator macro this is the case, but it's not something to bet on.

It gets worse, though. Since test1 returns the same object, it is possible to use continuations to mutate previous return values, with nary a vector-set! in sight!

(define (discriminate2 f)
  (let ((get-zero-cont #f)
        (escape #f))
    (define (get n)
      (case n
        ((0) (call/cc (lambda (k)
                        (set! get-zero-cont k)
                        0)))
        ((1) (if escape
                 (escape)
                 1))))
    (let ((result (f get)))
      (call/cc
       (lambda (k)
         (set! escape k)
         (get-zero-cont 42)))
      result)))

(discriminate2 test1) => #(42 1)
(discriminate2 test2) => #(0 1)

This... this is crazy.

story time

Now it's story time. Guile has a bytecode VM, and usually all code is compiled to that VM. But it also has an interpreter, for various purposes, and that interpreter is fairly classic: it's a recursive function that takes a "memoized expression" and an environment as parameters. Only, the environment was silly -- it was just a list of values. Before evaluating, a "memoizer" runs to resolve lexical references to indexes in that list, and entering a new lexical contour conses on that list.

Well of course that makes lexical variable lookup expensive. It usually doesn't matter as everything is compiled, but it's a bit shameful, so I rewrote it recently to use two-dimensional environments. Let me drop some ASCII on you:

   +------+------+------+------+------+
   | prev |slot 0|slot 1| ...  |slot N|
   +------+------+------+------+------+
      \/
   +------+------+------+------+------+
   | prev |slot 0|slot 1| ...  |slot N|
   +------+------+------+------+------+
      \/
     ...
      \/
   toplevel

It's a chain of vectors, linked through their first elements. Resolving a lexical in this environment has two dimensions, the depth and the width.

Vectors.

You see where I'm going with this?

I implemented the "push-new-environment" operation as a sequence of make-vector and an eval-and-vector-set! loop. Here's the actual clause that implements this:

(define (eval exp env)
  (match exp
    ...
    (('let (inits . body))
     (let* ((width (vector-length inits))
            (new-env (make-env width #f env)))
       (let lp ((i 0))
         (when (< i width)
           (env-set! new-env 0 i (eval (vector-ref inits i) env))
           (lp (1+ i))))
       (eval body new-env)))
    ...))

This worked fine. It was fast, and correct. Or so I thought. I used this interpreter to bootstrap a fresh Guile compile and all was good. Until running those damned test suites that use call/cc to return multiple times from let initializers, as in my discriminate2 test. While the identity of env isn't visible to a program as such, the ability of call/cc to peel apart allocation and initialization of the environment vector makes this particular implementation strategy not viable.

In the end I'll inline a few arities, and then have a general case that allocates heap storage for the temporaries:

(case (vector-length env)
  ((0) (vector env))
  ((1) (vector env (eval (vector-ref inits 0) env)))
  ...
  (else
   (list->vector
    (cons env
          (map (lambda (x) (eval x env))
               (vector->list inits))))))

Of course I'd use a macro to generate that. It's terrible, but oh well. Es lo que hay.

Syndicated 2013-11-02 13:42:06 from wingolog

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!