crhodes is currently certified at Master level.

Name: Christophe Rhodes
Member since: 2001-05-03 06:41:31
Last Login: 2014-04-25 07:18:49

FOAF RDF Share This

Homepage: http://www.doc.gold.ac.uk/~mas01cr/

Notes:

Notes here and here.

Oh, that probably isn't what "Notes" means. Oh well.

Physicist, Musician, Common Lisp programmer. Move along, there's nothing to see.

Projects

Recent blog entries by crhodes

Syndication: RSS 2.0

still working on reproducible builds

It’s been nearly fifteen years, and SBCL still can’t be reliably built by other Lisp compilers.

Of course, other peoples’ definition of “reliably” might differ. We did achieve successful building under unrelated Lisp compilers twelve years ago; there were a couple of nasty bugs along the way, found both before and after that triumphant announcement, but at least with a set of compilers whose interpretation of the standard was sufficiently similar to SBCL’s own, and with certain non-mandated but expected features (such as the type (array (unsigned-byte 8) (*)) being distinct from simple-vector, and single-float being distinct from double-float), SBCL achieved its aim of being buildable on a system without an SBCL binary installed (indeed, using CLISP or XCL as a build host, SBCL could in theory be bootstrapped starting with only gcc).

For true “reliability”, though, we should not be depending on any particular implementation-defined features other than ones we actually require – or if we are, then the presence or absence of them should not cause a visible difference in the resulting SBCL. The most common kind of leak from the host lisp to the SBCL binary was the host’s value of most-positive-fixnum influencing the target, causing problems from documentation errors all the way up to type errors in the assembler. Those leaks were mostly plugged a while ago, though they do recur every so often; there are other problems, and over the last week I spent some time tracking down three of them.

The first: if you’ve ever done (apropos "PRINT") or something similar at the SBCL prompt, you might wonder at the existence of functions named something like SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX (QUOTE (102))) (OP1 (QUOTE (58))) (OP2 (QUOTE (32))) (IMM NIL TYPE (QUOTE IMM-BYTE))) (QUOTE (NAME TAB REG , REG/MEM ...)))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|.

What is going on there? Well, these functions are a part of the disassembler machinery; they are responsible for taking a certain amount of the machine code stream and generating a printed representation of the corresponding assembly: in this case, for the PINSRB instruction. Ah, but (in most instruction sets) related instructions share a fair amount of structure, and decoding and printing a PINSRD instruction is basically the same as for PINSRB, with just one #x20 changed to a #x22 – in both cases we want the name of the instruction, then a tab, then the destination register, a comma, the source, another comma, and the offset in the destination register. So SBCL arranges to reuse the PINSRB instruction printer for PINSRD; it maintains a cache of printer functions, looked up by printer specification, and reuses them when appropriate. So far, so normal; the ugly name above is the generated name for such a function, constructed by interning a printed, string representation of some useful information.

Hm, but wait. See those (QUOTE (58)) fragments inside the name? They result from printing the list (quote (58)). Is there a consensus on how to print that list? Note that *print-pretty* is bound to nil for this printing; prior experience has shown that there are strong divergences between implementations, as well as long-standing individual bugs, in pretty-printer support. So, what happens if I do (write-to-string '(quote foo) :pretty nil)?

  • SBCL: "(QUOTE FOO)", unconditionally
  • CCL: "'FOO" by default; "(QUOTE FOO)" if ccl:*print-abbreviate-quote* is set to nil
  • CLISP: "'FOO", unconditionally (I read the .d code with comments in half-German to establish this)

So, if SBCL was compiled using CLISP, the name of the same function in the final image would be SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX '(102)) (OP1 '(58)) (OP2 '(32)) (IMM NIL TYPE 'IMM-BYTE)) '(NAME TAB REG , REG/MEM ...))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|. Which is shorter, and maybe marginally easier to read, but importantly for my purposes is not bitwise-identical.

Thus, here we have a difference between host Common Lisp compilers which leaks over into the final image, and it must be eliminated. Fortunately, this was fairly straightforward to eliminate; those names are never in fact used to find the function object, so generating a unique name for functions based on a counter makes the generated object file bitwise identical, no matter how the implementation prints two-element lists beginning with quote.

The second host leak is also related to quote, and to our old friend backquote – though not related in any way to the new implementation. Consider this apparently innocuous fragment, which is a simplified version of some code to implement the :type option to defstruct:

  (macrolet ((def (name type n)
             `(progn
                (declaim (inline ,name (setf ,name)))
                (defun ,name (thing)
                  (declare (type simple-vector thing))
                  (the ,type (elt thing ,n)))
                (defun (setf ,name) (value thing)
                  (declare (type simple-vector thing))
                  (declare (type ,type value))
                  (setf (elt thing ,n) value)))))
  (def foo fixnum 0)
  (def bar string 1))

What’s the problem here? Well, the functions are declaimed to be inline, so SBCL records their source code. Their source code is generated by a macroexpander, and so is made up of conses that are generated programmatically (as opposed to freshly consed by the reader). That source code is then stored as a literal object in an object file, which means in practice that instructions for reconstructing a similar object are dumped, to be executed when the object file is processed by load.

Backquote is a reader macro that expands into code that, when evaluated, generates list structure with appropriate evaluation and splicing of unquoted fragments. What does this mean in practice? Well, one reasonable implementation of reading `(type ,type value) might be:

  (cons 'type (cons type '(value)))

and indeed you might (no guarantees) see something like that if you do

  (macroexpand '`(type ,type value))

in the implementation of your choice. Similarly, reading `(setf (elt thing ,n) value) will eventually generate code like

  (cons 'setf (cons (cons 'elt (list 'thing n)) '(value)))

Now, what is “similar”? In this context, it has a technical definition: it relates two objects in possibly-unrelated Lisp images, such that they can be considered to be equivalent despite the fact that they can’t be compared:

similar adj. (of two objects) defined to be equivalent under the similarity relationship.

similarity n. a two-place conceptual equivalence predicate, which is independent of the Lisp image so that two objects in different Lisp images can be understood to be equivalent under this predicate. See Section 3.2.4 (Literal Objects in Compiled Files).

Following that link, we discover that similarity for conses is defined in the obvious way:

Two conses, S and C, are similar if the car of S is similar to the car of C, and the cdr of S is similar to the cdr of C.

and also that implementations have some obligations:

Objects containing circular references can be externalizable objects. The file compiler is required to preserve eqlness of substructures within a file.

and some freedom:

With the exception of symbols and packages, any two literal objects in code being processed by the file compiler may be coalesced if and only if they are similar [...]

Put this all together, and what do we have? That def macro above generates code with similar literal objects: there are two instances of '(value) in it. A host compiler may, or may not, choose to coalesce those two literal '(value)s into a single literal object; if it does, the inline expansion of foo (and bar) will have a circular reference, which must be preserved, showing up as a difference in the object files produced during the SBCL build. The fix? It’s ugly, but portable: since we can’t stop an aggressive compiler from coalescing constants which are similar but not identical, we must make sure that any similar substructure is in fact identical:

  (macrolet ((def (name type n)
             (let ((value '(value)))
               `(progn
                  (declaim (inline ,name (setf ,name)))
                  (defun ,name (thing)
                    (declare (type simple-vector thing))
                    (the ,type (elt thing ,n)))
                  (defun (setf ,name) (value thing)
                    (declare (type simple-vector thing))
                    (declare (type ,type . ,value))
                    (setf (elt thing ,n) . ,value)))))
  (def foo fixnum 0)
  (def bar string 1))

Having dealt with a problem with quote, and a problem with backquote, what might the Universe serve up for my third problem? Naturally, it would be a problem with a code walker. This code walker is somewhat naïve, assuming as it does that its body is made up of forms or tags; it is the assemble macro, which is used implicitly in the definition of VOPs (reusable assembly units); for example, like

  (assemble ()
  (move ptr object)
  (zeroize count)
  (inst cmp ptr nil-value)
  (inst jmp :e DONE)
 LOOP
  (loadw ptr ptr cons-cdr-slot list-pointer-lowtag)
  (inst add count (fixnumize 1))
  (inst cmp ptr nil-value)
  (inst jmp :e DONE)
  (%test-lowtag ptr LOOP nil list-pointer-lowtag)
  (error-call vop 'object-not-list-error ptr)
 DONE))

which generates code to compute the length of a list. The expander for assemble scans its body for any atoms, and generates binding forms for those atoms to labels:

  (let ((new-labels (append labels
                          (set-difference visible-labels inherited-labels))))
  ...
  `(let (,@(mapcar (lambda (name) `(,name (gen-label))) new-labels))
     ...))

The problem with this, from a reproducibility point of view, is that set-difference (and the other set-related functions: union, intersection, set-exclusive-or and their n-destructive variants) do not return the sets with a specified order – which is fine when the objects are truly treated as sets, but in this case the LOOP and DONE label objects ended up in different stack locations depending on the order of their binding. Consequently the machine code for the function emitting code for computing a list’s length – though not the machine code emitted by that function – would vary depending on the host’s implementation of set-difference. The fix here was to sort the result of the set operations, knowing that all the labels would be symbols and that they could be treated as string designators.

And after all this is? We’re still not quite there: there are three to four files (out of 330 or so) which are not bitwise-identical for differing host compilers. I hope to be able to rectify this situation in time for SBCL’s 15th birthday...

Syndicated 2014-10-14 06:51:19 from notes

interesting pretty-printer bug

One of SBCL’s Google Summer of Code students, Krzysztof Drewniak (no relation) just got to merge in his development efforts, giving SBCL a far more complete set of Unicode operations.

Given that this was the merge of three months’ out-of-tree work, it’s not entirely surprising that there were some hiccups, and indeed we spent some time diagnosing and fixing a 1000-fold slowdown in char-downcase. Touch wood, all seems mostly well, except that Jan Moringen reported that, when building without the :sb-unicode feature (and hence having a Lisp with 8-bit characters) one of the printer consistency tests was resulting in an error.

Tracking this down was fun; it in fact had nothing in particular to do with the commit that first showed the symptom, but had been lying latent for a while and had simply never shown up in automated testing. I’ve expressed my admiration for the Common Lisp standard before, and I’ll do it again: both as a user of the language and as an implementor, I think the Common Lisp standard is a well-executed document. But that doesn’t stop it from having problems, and this is a neat one:

When a line break is inserted by any type of conditional newline, any blanks that immediately precede the conditional newline are omitted from the output and indentation is introduced at the beginning of the next line.

(from pprint-newline)

For the graphic standard characters, the character itself is always used for printing in #\ notation---even if the character also has a name[5].

(from CLHS 22.1.3.2)

Space is defined to be graphic.

(from CLHS glossary entry for ‘graphic’)

What do these three requirements together imply? Imagine printing the list (#\a #\b #\c #\Space #\d #\e #\f) with a right-margin of 17:

  (write-to-string '(#\a #\b #\c #\Space #\d #\e #\f) :pretty t :right-margin 17)
; => "(#\\a #\\b #\\c #\\
; #\\d #\\e #\\f)"

The #\Space character is defined to be graphic; therefore, it must print as #\ rather than #\Space; if it happens to be printed just before a conditional newline (such as, for example, generated by using pprint-fill to print a list), the pretty-printer will helpfully remove the space character that has just been printed before inserting the newline. This means that a #\Space character, printed at or near the right margin, will be read back as a #\Newline character.

It’s interesting to see what other implementations do. CLISP 2.49 in its default mode always prints #\Space; in -ansi mode it prints #\ but preserves the space even before a conditional newline. CCL 1.10 similarly preserves the space; there’s an explicit check in output-line-and-setup-for-next for an “escaped” space (and a comment that acknowledges that this is a heuristic that can be wrong in the other direction). I’m not sure what the best fix for this is; it’s fairly clear that the requirements on the printer aren’t totally consistent. For SBCL, I have merged a one-line change that makes the printer print using character names even for graphic characters, if the *print-readably* printer control variable is true; it may not be ideal that print/read round-tripping was broken in the normal case, but in the case where it’s explicitly been asked for it is clearly wrong.

Syndicated 2014-10-06 20:48:14 from notes

settings for gnome-shell extension

A long, long time ago, I configured my window manager. What can I say? I was a student, with too much free time; obviously hoursdays spent learning some configuration file format and tweaking some aspect of behaviour would be repaid many times over the course of a working life. I suppose one thing it led to was my current career, so it’s probably not all a loss.

However, the direct productivity benefits almost certainly were a chimera; unfortunately, systems (hardware and software) changed too often for the productivity benefit (if any) to amortize the fixed set-up time, and so as a reaction I stopped configuring my window manager. In fact, I went the other way, becoming extremely conservative about upgrades of anything at all; I had made my peace with GNOME 2, accepting that there was maybe enough configurability and robustness in the 2.8 era or so for me not to have to change too much, and then not changing anything.

Along comes GNOME 3, and there are howls all over the Internet about the lack of friendly behaviour – “friendly” to people like me, who like lots of terminals and lots of editor buffers; there wasn’t much of an outcry from other people with more “normal” computing needs; who knows why? In any case, I stuck with GNOME 2 for a long time, eventually succumbing at the point of an inadvisable apt-get upgrade and not quite hitting the big red ABORT button in time.

So, GNOME 3. I found that I shared a certain amount of frustration with the vocal crowd: dynamic, vertically-arranged workspaces didn’t fit my model; I felt that clicking icons should generate new instances of applications rather than switch to existing instances, and so on. But, in the same timeframe, I adopted a more emacs-centric workflow, and the improvements of the emacs daemon meant that I was less dependent on particular behaviours of window manager and shell, so I gave it another try, and, with the right extensions, it stuck.

The right extensions? “What are those?” I hear you cry. Well, in common with illustrious Debian Project Leaders past, I found that a tiling extension made many of the old focus issues less pressing. My laptop is big enough, and I have enough (fixed) workspaces, that dividing up each screen between applications mostly works. I also have a bottom panel, customized to a height of 0 pixels, purely to give me the fixed number of workspaces; the overview shows them in a vertical arrangement, but the actual workspace arrangement is the 2x4 that I’m used to.

One issue with a tiled window arrangement, though, is an obvious cue to which window has focus. I have also removed all window decorations, so the titlebar or border don’t help with this; instead, a further extension to shade inactive windows helps to minimize visual distraction. And that’s where the technical part of this blog entry starts...

One of the things I do for my employer is deliver a module on Perception and Multimedia Computing. In the course of developing that module, I learnt a lot about how we see what we see, and also how digital displays work. And one of the things I learnt to be more conscious about was attention: in particular, how my attention can be drawn (for example, I can not concentrate on anything where there are animated displays, such as are often present in semi-public spaces, such as bars or London City airport.)

The shade inactive windows extension adds a brightness-reducing effect to windows without focus. So, that was definitely useful, but after a while I noticed that emacs windows with some text in error-face (bold, bright red) in them were still diverting my attention, even when they were unfocussed and therefore substantially dimmed.

So I worked on a patch to the extension to add a saturation-reducing effect in addition to reducing the brightness. And that was all very well – a classic example of taking code that almost does what you want it to do, and then maintenance-programming it into what you really want it to do – until I also found that the hard-wired time over which the effect took hold (300ms) was a bit too long for my taste, and I started investigating what it would take to make these things configurable.

Some time later, after exploring the world with my most finely-crafted google queries, I came to the conclusion that there was in fact no documentation for this at all. The tutorials that I found were clearly out-dated, and there were answers to questions on various forums whose applicability was unclear. This is an attempt to document the approach that worked for me; I make no claims that this is ‘good’ or even acceptable, but maybe there’s some chance that it will amortize the cost of the time I spent flailing about over other people wanting to customize their GNOME shell.

The first thing that something, anything with a preference needs, is a schema for that preference. In this instance, we’re starting with the shade-inactive-windows in the hepaajan.iki.fi namespace, so our schema will have a path that begins “/fi/iki/hepaajan/shade-inactive-windows”, and we're defining preferences, so let’s add “/preferences” to that.

  <?xml version="1.0" encoding="UTF-8"?>
<schemalist>
  <schema path="/fi/iki/hepaajan/shade-inactive-windows/preferences/"

a schema also needs an id, which should probably resemble the path

            id="fi.iki.hepaajan.shade-inactive-windows.preferences"

except that there are different conventions for hierarchy (. vs /).

            gettext-domain="gsettings-desktop-schemas">

and then here’s a cargo-culted gettext thing, which is probably relevant if the rest of the schema will ever be translated into any non-English language.

In this instance, I am interested in a preference that can be used to change the time over which the shading of inactie windows happens. It’s probably easiest to define this as an integer (the "i" here; other GVariant types are possible):

      <key type="i" name="shade-time">

which corresponds to the number of milliseconds

        <summary>Time in milliseconds over which shading occurs</summary>
      <description>
        The time over which the shading effect is applied, in milliseconds.
      </description>

which we will constrain to be between 0 and 1000, so that the actual time is between 0s and 1s, with a default of 0.3s:

        <range min="0" max="1000"/>
      <default>300</default>

and then there's some XML noise

      </key>
  </schema>
</schemalist>

and that completes our schema. For reasons that will become obvious later, we need to store that schema in a directory data/glib-2.0/schemas relative to our base extension directory; giving it a name that corresponds to the schema id (so fi.iki.hepaajan.shade-inactive-windows.preferences.gschema.xml in this case) is wise, but probably not essential. In order for the schema to be useful later, we also need to compile it: that’s as simple as executing glib-compile-schemas . within the schemas directory, which should produce a gschemas.compiled file in the same directory.

Then, we also need to adapt the extension in question to lookup a preference value when needed, rather than hard-coding the default value. I have no mental model of the namespacing, or other aspects of the environment, applied to GNOME shell extensions’ javascript code, so being simultaneously conservative and a javascript novice I added a top-level variable unlikely to collide with anything:

  var ShadeInactiveWindowsSettings = {};
function init() {

The extension previously didn’t need to do anything on init(); now, however, we need to initialize the settings object, including looking up our schema to discover what settings there are. But where is our schema? Well, if we’re running this extension in-place, or as part of a user installation, we want to look in data/glib-2.0/schemas/ relative to our own path; if we have performed a global installation, the schema will presumably be in a path that is already searched for by the default schema finding methods. So...

      var schemaDir = ExtensionUtils.getCurrentExtension().dir.get_child('data').get_child('glib-2.0').get_child('schemas');
    var schemaSource = Gio.SettingsSchemaSource.get_default();

    if(schemaDir.query_exists(null)) {
        schemaSource = Gio.SettingsSchemaSource.new_from_directory(schemaDir.get_path(), schemaSource, false);
    }

... we distinguish between those two cases by checking to see if we can find a data/glib-2.0/schemas/ directory relative to the extension’s own directory; if we can, we prepend that directory to the schema source search path. Then, we lookup our schema using the id we gave it, and initialize a new imports.gi.Gio.Settings object with that schema.

      var schemaObj = schemaSource.lookup('fi.iki.hepaajan.shade-inactive-windows.preferences', true);
    if(!schemaObj) {
        throw new Error('failure to look up schema');
    }
    ShadeInactiveWindowsSettings = new Gio.Settings({ settings_schema: schemaObj });
}

Then, whenever we use the shade time in the extension, we must make sure to look it up afresh:

  var shade_time = ShadeInactiveWindowsSettings.get_int('shade-time') / 1000;

in order that any changes made by the user take effect immediately. And that’s it. There’s an additional minor wrinkle, in that altering that configuration variable is not totally straightforward; dconf and gettings also need to be told where to look for their schema; that’s done using the XDG_DATA_DIRS configuration variable. For example, once the extension is installed locally, you should be able to run

  XDG_DATA_DIRS=$HOME/.local/gnome-shell/extensions/shade-inactive-windows@hepaajan.iki.fi/data:$XDG_DATA_DIRS dconf

and then navigate to the fi/iki/hepaajan/shade-inactive-windows/preferences schema and alter the shade-time preference entry.

Hooray! After doing all of that, we have wrestled things into being configurable: we can use the normal user preferences user interface to change the time over which the shading animation happens. I’m not going to confess how many times I had to restart my gnome session, insert logging code, look at log files that are only readable by root, and otherwise leave my environment; I will merely note that we are quite a long way away from the “scriptable user interface” – and that if you want to do something similar (not identical, but similar) in an all-emacs world, it might be as simple as evaluating these forms in your *scratch* buffer...

  (set-face-attribute 'default nil :background "#eeeeee")
(defvar my/current-buffer-background-overlay nil)

(defun my/lighten-current-buffer-background ()
  (unless (window-minibuffer-p (selected-window))
    (unless my/current-buffer-background-overlay
      (setq my/current-buffer-background-overlay (make-overlay 1 1))
      (overlay-put my/current-buffer-background-overlay
       'face '(:background "white")))
    (overlay-put my/current-buffer-background-overlay 'window
                 (selected-window))
    (move-overlay my/current-buffer-background-overlay
                  (point-min) (point-max))))
(defun my/unlighten-current-buffer-background ()
  (when my/current-buffer-background-overlay
    (delete-overlay my/current-buffer-background-overlay)))

(add-hook 'pre-command-hook #'my/unlighten-current-buffer-background) 
(add-hook 'post-command-hook #'my/lighten-current-buffer-background)

Syndicated 2014-10-06 19:55:13 from notes

code walking for pipe sequencing

Since it seems still topical to talk about Lisp and code-transformation macros, here’s another worked example – this time inspired by the enthusiasm for the R magrittr package.

The basic idea behind the magrittr package is, as Hadley said at EARL2014, to convert from a form of code where arguments to the same function are far apart to one where they’re essentially close together; the example he presented was converting

  arrange(
  summarise
    group_by(
      filter(babynames, name == "Hadley"),
      year),
    total = sum(n)
  desc(year))

to

  b0 <- babynames
b1 <- filter(b0, name == "Hadley")
b2 <- group_by(b1, year)
b3 <- summarise(b2, total = sum(n))
b4 <- arrange(b3, desc(year))

only without the danger of mistyping one of the variable names along the way and failing to perform the computation that was intended.

R, as I have said before, is a Lisp-1 with weird syntax and wacky evaluation semantics. One of the things that ordinary user code can do is inspect the syntactic form of its arguments, before evaluating them. This means that when looking at a fragment of code such as

  foo(bar(2,3), 4)

where a call-by-value language would first evaluate bar(2,3), then call foo with two arguments (the value resulting from the evaluation, and 4), R instead uses a form of call-by-need evaluation, and also provides operators for inspecting the promise directly. This means R users can do such horrible things as

  foo <- function(x) {
    tmp <- substitute(x)
    sgn <- 1
    while(class(tmp) == "(") {
        tmp <- tmp[[2]]
        sgn <- sgn * -1
    }
    sgn * eval.parent(tmp)
}
foo(3) # 3
foo((3)) # -3
foo(((3))) # 3
foo((((3)))) # -3 (isn’t this awesome?  I did say “wacky”)

In the case of magrittr, the package authors have taken advantage of this to invent some new syntax; the pipe operator %>% is charged with inserting its first argument (its left-hand side, in normal operation) as the first argument to the call of its second argument (right-hand side). Hadley’s example is

  babynames %>%
  filter(name == "Hadley") %>%
  group_by(year) %>%
  summarise(total = sum(n)) %>%
  arrange(desc(year))

and this is effective because the data flow in this case really is a pipeline: there's a dataset, which needs filtering, then grouping, then summarization, then sorting, and each operation works on the result of the previous. This already needs to inspect the syntactic form of the argument; an additional feature is recognizing the presence of .s in the call, and placing the left-hand side value in that argument position instead of as the first argument if it is present.

In Common Lisp, there are some piping or chaining operators out there (e.g. one two three (search for ablock) four and probably many others), and they do well enough. However! They mostly suffer from similar problems that we’ve seen before: doing code transformations with not quite enough understanding of the semantics of the code that they’re transforming; again, that’s fine for normal use, but for didactic purposes let’s pretend that we really care about this.

The -> macro from http://stackoverflow.com/a/11080068 is basically the same as the magrittr %>% operator: it converts symbols in the pipeline to function calls, and places the result of the previous evaluation as the first argument of the current operator, except if a $ is present in the arguments, in which case it replaces that. (This version doesn’t support more than one $ in the argument list; it would be a little bit of a pain to support that, needing a temporary name, but it’s straightforward in principle).

Since the -> macro does its job, a code-walker implementation isn’t strictly necessary: pure syntactic manipulation is good enough, and if it’s used with just the code it expects, it will do it well. It is of course possible to express what it does using a code-walker; we’ll fix the multiple-$ ‘bug’ along the way, by explicitly introducing bindings rather than replacements of symbols:

  (defmacro -> (form &body body)
  (labels ((find-$ (form env)
             (sb-walker:walk-form form env
              (lambda (f c e)
                (cond
                  ((eql f '$) (return-from find-$ t))
                  ((eql f form) f)
                  (t (values f t)))))
             nil)
           (walker (form context env)
             (cond
               ((symbolp form) (list form))
               ((atom form) form)
               (t (if (find-$ form env)
                      (values `(setq $ ,form) t)
                      (values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
    `(let (($ ,form))
       ,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) body))))

How to understand this implementation? Well, clearly, we need to understand what sb-walker:walk does. Broadly, it calls the walker function (its third argument) on successive evaluated subforms of the original form (and on variable names set by setq); the primary return value is used as the interim result of the walk, subject to further walking (macroexpansion and walking of its subforms) except if the second return value from the walker function is t.

Now, let’s start with the find-$ local function: its job is to walk a form, and returns t if it finds a $ variable to be evaluated at toplevel and nil otherwise. It does that by returning t if the form it’s given is $; otherwise, if the form it’s given is the original form, we need to walk its subforms, so return f; otherwise, return its form argument f with a secondary value of t to inhibit further walking. This operation is slightly at odds with the use of a code walker: we are explicitly not taking advantage of the fact that it understands the semantics of the code it’s walking. This might explain why the find-$ function itself looks a bit weird.

The walker local function is responsible for most of the code transformation. It binds $ to the value of the first form, then repeatedly sets $ to the value of successive forms, rewritten to interpolate a $ in the first argument position if there isn’t one in the form already (as reported by find-$). If any of the forms is a symbol, it gets listified and subsequently re-walked. Thus

  (macroexpand-1 '(-> "THREE" string-downcase (char 0)))
; => (LET (($ "THREE"))
;      (SETQ $ (STRING-DOWNCASE $))
;      (SETQ $ (CHAR $ 0))),
;    T

So far, so good. Now, what could we do with a code-walker that we can’t without? Well, the above implementation of -> supports chaining simple function calls, so one answer is “chaining things that aren’t just function calls”. Another refinement is to support eliding the insertion of $ when there are any uses of $ in the form, not just as a bare argument. Looking at the second one first, since it’s less controversial:

  (defmacro -> (form &body body)
  (labels ((find-$ (form env)
             (sb-walker:walk-form form env
              (lambda (f c e)
                (cond
                  ((and (eql f '$) (eql c :eval))
                   (return-from find-$ t))
                  (t f))))
             nil)
           (walker (form context env)
             (cond
               ((symbolp form) (list form))
               ((atom form) form)
               (t (if (find-$ form env)
                      (values `(setq $ ,form) t)
                      (values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
    `(let (($ ,form))
       ,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) body))))

The only thing that’s changed here is the definition of find-$, and in fact it’s a little simpler: the task is now to walk the entire form and find uses of $ in an evaluated position, no matter how deep in the evaluation. Because this is a code-walker, this will correctly handle macros, backquotes, quoted symbols, and so on, and this allows code of the form

  (macroexpand-1 '(-> "THREE" string-downcase (char 0) char-code (complex (1+ $) (1- $))))
; => (LET (($ "THREE"))
;      (SETQ $ (STRING-DOWNCASE $))
;      (SETQ $ (CHAR-CODE $))
;      (SETQ $ (COMPLEX (1+ $) (1- $)))),
;    T

which, as far as I can tell, is not supported in magrittr: doing 3 %>% complex(.+1,.-1) is met with the error that “object '.' not found”. Supporting this might, of course, not be a good idea, but at least the code walker shows that it’s possible.

What if we wanted to augment -> to handle binding forms, or special forms in general? This is probably beyond the call of duty, but let’s just briefly imagine that we wanted to be able to support binding special variables around the individual calls in the chain; for example, we want

  (-> 3 (let ((*random-state* (make-random-state))) rnorm) mean)

to expand to

  (let (($ 3))
  (setq $ (let ((*random-state* (make-random-state))) (rnorm $)))
  (setq $ (mean $)))

and let us also say, to make it interesting, that uses of $ in the bindings clauses of the let should not count against inhibiting the insertion of $ in the first argument position of the first form in the body of the let, so

  (-> 3 (let ((y (1+ $))) (atan y)))

should expand to

  (let (($ 3)) (setq $ (let ((y (1+ $))) (atan $ y))))

So our code walker needs to walk the bindings of the let, merely collecting information into the walker’s lexical environment, then walk the body performing the same rewrite as before. CHALLENGE ACCEPTED:

  (defmacro -> (&body forms)
  (let ((rewrite t))
    (declare (special rewrite))
    (labels ((find-$ (form env)
               (sb-walker:walk-form form env
                (lambda (f c e)
                  (cond
                    ((and (eql f '$) (eql c :eval))
                     (return-from find-$ t))
                    (t f))))
               nil)
             (walker (form context env)
               (declare (ignore context))
               (typecase form
                 (symbol (if rewrite (list form) form))
                 (atom form)
                 ((cons (member with-rewriting without-rewriting))
                  (let ((rewrite (eql (car form) 'with-rewriting)))
                    (declare (special rewrite))
                    (values (sb-walker:walk-form (cadr form) env #'walker) t)))
                 ((cons (member let let*))
                  (unless rewrite
                    (return-from walker form))
                  (let* ((body (member 'declare (cddr form)
                                       :key (lambda (x) (when (consp x) (car x))) :test-not #'eql))
                         (declares (ldiff (cddr form) body))
                         (rewritten (sb-walker:walk-form
                                     `(without-rewriting
                                          (,(car form) ,(cadr form)
                                            ,@declares
                                            (with-rewriting
                                                ,@body)))
                                     env #'walker)))
                    (values rewritten t)))
                 (t
                  (unless rewrite
                    (return-from walker form))
                  (if (find-$ form env)
                      (values `(setq $ ,form) t)
                      (values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
      `(let (($ ,(car forms)))
         ,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) (cdr forms))))))

Here, find-$ is unchanged from the previous version; all the new functionality is in walker. How does it work? The default branch of the walker function is also unchanged; what has changed is handling of let and let* forms. The main trick is to communicate information between successive calls to the walker function, and turn the rewriting on and off appropriately: we wrap parts of the form in new pseudo-special operators with-rewriting and without-rewriting, which is basically a tacky and restricted implementation of compiler-let – if we needed to, we could do a proper one with macrolet. Within the scope of a without-rewriting, walker doesn’t do anything special, but merely return the form it was given, except if the form it’s given is a with-rewriting form. This is a nice illustration, incidentally, of the idea that lexical scope in the code translates nicely to dynamic scope in the compiler; I can’t remember where I read that first (but it’s certainly not a new idea).

And now

  (macroexpand '(-> 3 (let ((*random-state* (make-random-state))) rnorm) mean))
; => (LET (($ 3))
;      (LET ((*RANDOM-STATE* (MAKE-RANDOM-STATE)))
;        (SETQ $ (RNORM $)))
;      (SETQ $ (MEAN $))),
;    T
(macroexpand '(-> 3 (let ((y (1+ $))) (atan y))))
; => (LET (($ 3))
;      (LET ((Y (1+ $)))
;        (SETQ $ (ATAN $ Y)))),
;    T

Just to be clear: this post isn’t advocating a smarter pipe operator; I don’t have a clear enough view, but I doubt that the benefits of the smartness outweigh the complexity. It is demonstrating what can be done, in a reasonably controlled way, using a code-walker: ascribing semantics to fragments of Common Lisp code, and combining those fragments in a particular way, and of course it’s another example of sb-walker:walk in use.

Finally, if something like this does in fact get used, people sometimes get tripped up by the package system: the special bits of syntax are symbols, and importing or package-qualifying -> without doing the corresponding thing to $ would lead to cryptic errors, wrong results and/or confusion. One possibility to handle that is to invent a bit more reader syntax:

  (set-macro-character #\¦
 (defun pipe-reader (stream char)
   (let ((*readtable* (copy-readtable)))
     (set-macro-character #\·
      (lambda (stream char)
        (declare (ignore stream char))
        '$) t)
   (cons '-> (read-delimited-list char stream t)))) nil)
¦"THREE" string-downcase (find-if #'alpha-char-p ·) char-code¦

If this is the exported syntax, it has the advantage that the interface can only be misused intentionally: the actual macro and its anaphoric symbol are both hidden from the programmer; and the syntax is reasonably easy to type – on my keyboard ¦ is AltGr+| and · is AltGr+. – and moderately mnemonic from shell pipes and function notation respectively. It also has all the usual disadvantages of reader-based interfaces, such as composability, somewhat mitigated if pipe-reader is part of the macro’s exported interface.

Syndicated 2014-09-25 14:01:45 from notes

earl conference

This week, I went to the Effective Applications of the R Language conference. I’d been alerted to its existence from my visit to a londonR meeting in June. Again, I went for at least two reasons: one as an R enthusiast, though admittedly one (as usual) more interested in tooling than in applications; and one as postgraduate coordinator in Goldsmiths Computing, where for one of our modules in the new Data Science programme (starting todayyesterday! Hooray for “Welcome Week”!) involves exposing students to live data science briefs from academia and industry, with the aim of fostering a relevant and interesting final project.

A third reason? Ben Goldacre as invited speaker. A fantastic choice of keynote, even if he did call us ‘R dorks’ a lot and confess that he was a Stata user. The material, as one might expect, was derived from his books and related experience, but the delivery was excellent, and the drive clear to see. There were some lovely quotes that it’s tempting to include out of context; in the light of my as-yet unposed ‘research question’ for the final module of the PG Certificate in Higher Education – that I am still engaged on – it is tempting to bait the “infestation of qualitative researchers in the educational research establishment”, and attempt a Randomised Controlled Trial, or failing that a statistical analysis of assessments to try to uncover suitable hypotheses for future testing.

Ben’s unintended warm-up act – they were the other way around in the programme, but clearly travelling across London is complicated – was Hadley Wickham, of ggplot2 fame. His talk, about RStudio, his current view on the data analysis workflow, and new packages to support it, was a nice counterpoint: mostly tools, not applications, but clearly focussed to help make sense of complicated (and initially untidy) datasets. I liked the shiny-based in-browser living documents in R-markdown, which is not a technology that I’ve investigated yet; at this rate I will have more options for reproducible research than reports written. He, and others at the conference, were advocating a pipe-based code sequencing structure – the R implementation of this is called magrittr (ha, ha) and has properties that are made possible to user code through R’s nature of a Lisp-1 with crazy evaluation semantics, on which more in another post.

The rest of the event was made up of shorter, usually more domain-specific talks: around 20 minutes for each speaker. I think it suffered a little bit from many of the participants not being able to speak freely – a natural consequence of a mostly-industrial event, but frustrating. I think it was also probably a mistake to schedule in the first one of the regular (parallel) sessions of the event a reflective slot for three presentations about comparing R with other languages (Python, Julia, and a more positive one about R’s niche): there hadn’t really been time for a positive tone to be established, and it just felt like a bit of a downer. (Judging by the room, most of the delegates – perhaps wisely – had opted for the other track, on “Business Applications of R”).

Highlights of the shorter talks, for me:

  • YPlan’s John Sandall talking about “agile” data analytics, leading to agile business practices relentlessly focussed on one KPI. At the time, I wondered whether the focus on the first time a user gives them money would act against building something of lasting value – analytics give the power to make decisions, but the underlying strategy still has to be thought about. On the other hand, I’m oh-too familiar with the notion that startups must survive first and building something “awesome” is a side-effect, and focussing on money in is pretty sensible.
  • Richard Pugh’s (from Mango Solutions) talk about modelling and simulating the behaviour of a sales team did suffer from the confidentiality problem (“I can’t talk about the project this comes from, or the data”) but was at least entertaining: the behaviours he talked about (optimistic opportunity value, interactions of CRM closing dates with quarter boundaries) were highly plausible, and the question of whether he was applying the method to his own sales team quite pointed. (no)
  • the team from Simpson Carpenter Ltd, as well as saying that “London has almost as many market research agencies as pubs” (which rings true) had what I think is a fair insight: R is perhaps less of a black-box than certain commercial tools; there’s a certain retrocomputing feel to starting R, being at the prompt, and thinking “now what?” That implies that to actually do something with R, you need to know a bit more about what you’re doing. (That didn’t stop a few egregiously bad graphs being used in other presentations, including my personal favourite of a graph of workflow expressed as business value against time, with the inevitable backwards-arrows).
  • some other R-related tools to look into:

And then there was of course the hallwaybreak room track; Tower Hotel catered admirably for us, with free-flowing coffee, nibbles and lunch. I had some good conversations with a number of people, and am optimistic that students with the right attitude could both benefit and gain hugely from data science internships. I’m sure I was among the most tool-oriented of the attendees (most of the delegates were actually using R), but I did get to have a conversation with Hadley about “Advanced R”, and we discussed object systems, and conditions and restarts. More free-form notes about the event on my wiki.

Meanwhile, in related news, parts of the swank backend implementation of SLIME changed, mostly moving symbols to new packages. I've updated swankr to take account of the changes, and (I believe, untested) preserved compatibility with older (pre 2014-09-13) SLIMEs.

Syndicated 2014-09-23 10:08:42 (Updated 2014-09-23 20:33:50) from notes

195 older entries...

 

crhodes certified others as follows:

  • crhodes certified crhodes as Journeyer
  • crhodes certified dan as Master
  • crhodes certified wnewman as Master
  • crhodes certified fufie as Journeyer
  • crhodes certified moray as Apprentice
  • crhodes certified mjg59 as Master
  • crhodes certified adw as Apprentice
  • crhodes certified bmastenbrook as Journeyer
  • crhodes certified magnusjonsson as Journeyer
  • crhodes certified danstowell as Journeyer

Others have certified crhodes as follows:

  • crhodes certified crhodes as Journeyer
  • rjain certified crhodes as Journeyer
  • pvaneynd certified crhodes as Journeyer
  • fufie certified crhodes as Journeyer
  • mwh certified crhodes as Journeyer
  • cmm certified crhodes as Master
  • dan certified crhodes as Master
  • jao certified crhodes as Master
  • varjag certified crhodes as Journeyer
  • fxn certified crhodes as Journeyer
  • jtjm certified crhodes as Journeyer
  • mk certified crhodes as Journeyer
  • mjg59 certified crhodes as Journeyer
  • adw certified crhodes as Journeyer
  • tbmoore certified crhodes as Master
  • mdanish certified crhodes as Master
  • negative certified crhodes as Journeyer
  • hanna certified crhodes as Journeyer
  • Fare certified crhodes as Master
  • moray certified crhodes as Journeyer
  • richdawe certified crhodes as Journeyer
  • pjcabrera certified crhodes as Journeyer
  • water certified crhodes as Master
  • ingvar certified crhodes as Journeyer
  • ebf certified crhodes as Journeyer
  • bmastenbrook certified crhodes as Master
  • xach certified crhodes as Master
  • kai certified crhodes as Master
  • davep certified crhodes as Master
  • bhyde certified crhodes as Master
  • cyrus certified crhodes as Master
  • technik certified crhodes as Master
  • danstowell certified crhodes as Journeyer
  • cannam certified crhodes as Journeyer

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page