Older blog entries for crhodes (starting at number 208)

20 Mar 2015 (updated 20 Mar 2015 at 18:13 UTC) »

els2015 is nearly here

This year, I have had the dubious pleasure of being the Local Organizer for the European Lisp Symposium 2015, which is now exactly one month away; in 31 days, hordes of people will be descending on South East London to listen to presentations, give lightning talks and engage in general discussions about all things Lisp – the programme isn’t quite finalized, but expect Racket, Clojure, elisp and Common Lisp to be represented, as well as more... minority interests, such as C++.

Registration is open! In fact, for the next nine days (until 29th March) the registration fee is set at the absurdly low price of €120 (€60 for students) for two days of talks, tutorials, demos, coffee, pastries, biscuits, convivial discussion and a conference dinner. I look forward to welcoming old and new friends alike to Goldsmiths.

Syndicated 2015-03-20 17:04:33 (Updated 2015-03-20 17:32:13) from notes

16 Mar 2015 »

tmus research programmer position

The AHRC-funded research project that I am a part of, Transforming Musicology, is recruiting a developer for a short-term contract, primarily to work with me on database systems for multimedia (primarily audio) content. The goal for that primary part of the contract is to take some existing work on audio feature extraction and probabilistic nearest-neighbour search indexing, and to develop a means for specialist users (e.g. musicologists, librarians, archivists, musicians) to access the functionality without needing to be experts in the database domain. This of course will involve thinking about user interfaces, but also about distributed computation, separation of data and query location, and so on.

The funding is for six months of programmer time. I would have no objection to someone working for six months in a concentrated block of time; I would also have no objection to stretching the funding out over a longer period of calendar time: it might well provide more time for reflection and a better outcome in the end. I would expect the development activities to be exploratory as well as derived from a specification; to span between the systems and the interface layer; to be somewhat polyglot (we have a C++ library, bindings in Python, Common Lisp and Haskell, and prototype Javascript and Emacs front-ends – no applicant is required to be fluent in all of these!)

There are some potentially fun opportunities during the course of the contract, not least working with the rest of the Transforming Musicology team. The post is based at Goldsmiths, and locally we have some people working on Systems for Early Music, on Musicology and Social Networking, and on Musical Memory; the Goldsmiths contribution is part of a wider effort, with partners based at Oxford working on Wagnerian Leitmotif and on a Semantic Infrastructure, at Queen Mary working on mid-level representations of music, and in Lancaster coordinating multiple smaller projects. As well as these opportunities for collaboration, there are a number of events coming up: firstly, the team would hope to have a significant presence at the conference of the International Society for Music Information Retrieval, which will be held in Málaga in October. We don’t yet know what we will be submitting there (let alone what will be accepted!) but there should be an opportunity to travel there, all being well. In July we’ll be participating in the Digital Humanities at Oxford Summer School, leading a week-long programme of workshops and lectures on digital methods for musicology. Also, in November of 2014, we also participated in the AHRC’s “Being Human” festival, with a crazy effort to monitor participants’ physiological responses to Wagner opera; there’s every possibility that we will be invited to contribute to a similar event this year. And there are other, similar projects in London with whom we have friendly relations, not least the Digital Music Lab project at City University and the EU-funded PRAISE project at Goldsmiths.

Sound interesting? Want to apply? There’s a more formal job advert on the project site, while the “person specification” and other application-related materials is with Goldsmiths HR. The closing date for applications is the 27th March; we’d hope to interview shortly after that, and to have the developer working with us from some time in May or June. Please apply!

Syndicated 2015-03-16 11:47:14 from notes

10 Mar 2015 »

ref2014 data update

Does anyone still care about REF2014? Apart from agonizing about what it will mean in the new assignments of quality-related funding for institutions, obviously.

Among the various surprising things I have had to do this term (as in “surprise! You have to do this”) was to participate in the Winter graduation ceremony: reading out the names of graduands. It was fun; the tiniest bit stressful, because I hadn’t ever managed to attend a ceremony – but since everyone was there to celebrate, most of the pressure was off; I think I managed to doff my hat at the required times and not to trip over my gown while processing. Part of the ceremony is a valedictory speech from the Warden (vice-Chancellor equivalent), and it was perhaps inevitable that part of that was a section congratulating our soon-to-be alumni on belonging to an institution with high-quality research, or possibly “research intensity”.

That reminded me to take another look at the contextual data published by the Higher Education Statistics Authority; part of the “research intensity” calculation involves an estimate of the number of staff who were eligible to participate in the REF. It is only an estimate, not for any fundamental reason but because the employment data and the research submissions were collected by two different agencies; the data quality is not great, resulting probably in about equal measure from database schema mismatches (working out which REF2014 “Unit of Assessment” a given member of departmental staff belongs to) and human error. The good news is that at least the human error can be corrected later; there are now four Universities who have submitted corrected employment numbers to HESA, subtracting off a large number of research assistants (fixed-term contract researchers) from their list of REF-eligible staff – which naturally tends to bump up their measured research intensity.

New corrections mean new spreadsheets, new slightly-different data layouts, and new ingestion code; I suffer so you don’t have to. I’ve made a new version of my ref2014 R package containing the REF2014 results and contextual data; I’ve also put the source of the package up on github in case anyone wants to criticize (or learn from, I guess) my packaging.

Syndicated 2015-03-10 13:36:30 from notes

31 Dec 2014 »

a year in review

A brief retrospective, partly brought to you by grep:

CATS credits earnt: 30 (15 + 7.5 + 7.5) at level 7
crosswords solved: >=40
words blogged: 75k
words blogged, excluding crosswords: 50k
SBCL releases made: 12 (latest today!)
functioning ARM boards sitting on my desk: 3 (number doing anything actually useful, beyond SBCL builds: 0 so far, working on it)
emacs packages worked on: 2 (iplayer squeeze)
public engagement events co-organized: 1
ontological inconsistencies resolved: 1
R packages made: 1 (University PR departments offended: not enough)

Blogging’s been a broad success; slightly tailed off of late, what with

Crawl Dungeon Sprint wins: 2 (The Pits, Thunderdome: both GrFi)

so there’s an obvious New Year’s Resolution right there.

Happy New Year!

Syndicated 2014-12-31 22:27:37 from notes

28 Dec 2014 »

ref2014 data in R package form

As threatened, one last post about REF2014. Actually this post is not so much about REF2014 any more; it’s mostly about R. I’ve finally found the time to package up the REF2014 data as a convenient R package, along with accompanying documentation and a “vignette”. Some quick observations about the process:

it’s not desperately easy to find authoritative documentation on best practices, or even appropriate first steps. There’s authoritative documentation on the format and contents of an R package, but that documentation suffers from describing what not to do without describing what to do instead (example); there’s Hadley Wickham’s suggestions for best practices, but not quite enough of those suggestions were applicable to this case (data-only, converted from upstream formats). Putting it all together took more brain cycles than I care to admit.
Sweave is a bit clunky. It does the job but I can’t help but compare it to org-mode and markdown (R has knitr for markdown vignettes, but I couldn’t get rmarkdown working in the budgeted time, so fell back to Sweave).
it’s quite nice to be forced to document all the columns of a data frame. I say “forced”; it’s not that strong, but having R CMD check tell me about all the bad things I’ve done is a decent motivator.

I’m not sure what the licence of the REF2014 data is (or even if that is a question that makes sense). It appears I’m not the only one who is unsure; the Web and Internet Science crowd at Southampton have put up a RDF version of the REF2014 data, and Christopher Gutteridge doesn’t know about licensing either. Meanwhile, in the general belief that having this dataset more available is likely to be positive, all things considered, have at it (and tell me where the packaging has gone wrong...).

Syndicated 2014-12-28 22:44:45 (Updated 2014-12-28 22:55:18) from notes

23 Dec 2014 »

research excellence framework more ranks to choose from

Apparently the Pro-Warden of Research and Enterprise at my institution discovered the ‘GPA intensity’ measure for ranking Universities: or at least saw fit this morning to circulate to all staff a message saying

This league table both confirms and affirms Goldsmiths as a research-intensive university within an inclusive research culture – something we can all be proud of.

which is at least sort-of true – as I suggested last night, I think an intensity measure is just about on the side of the angels. One flaw in the REF process is the element of choice in who to submit, though there are problems in just disallowing that choice, including the coercion of staff onto teaching-only contracts; maybe the right answer is to count all employees of Universities, and not distinguish between research-active and teaching-only (or indeed between academic and administrator): that at least would give a visibility of the actual density (no sniggering at the back, please) of research in an institution.

As should be obvious from yesterday’s post, I don’t think that the league table generated by GPA intensity, or indeed the league table by unweighted GPA, or indeed any other league table has the power to confirm anything in particular. I think it’s generally true that Goldsmiths is an institution where a lot of research goes on, and also that it has a generally open and inclusive culture: certainly I have neither experienced nor heard anything like the nastiness at Queen Mary, let alone the tragedy of Stefan Grimm at Imperial. But I don’t like the thoughtless use of data to support even narratives that I might personally believe, so let’s challenge the idea that GPA intensity rank is a meaningful measure.

What does GPA intensity actually mean? It means the GPA that would have been achieved by the institution if all eligible staff were submitted to the REF, rather than just those who actually were, and that all the staff who weren’t originally submitted received “unclassified” (i.e. 0*) assessments for all their contributions. That’s probably a bit too strong an interpretation; it’s easy to see how institutions would prefer not to submit staff with lower-scoring outputs, even if their scoring would be highly unlikely to be 0*, whether they are optimizing for league tables or QR funding. So intensity-corrected GPA probably overestimates the “rank” of institutions whose strategy was biased towards submitting a higher proportion of their staff (and conversely underestimates the “rank” of institutions whose strategy was to submit fewer of their staff).

It probably is fair to guess, though, that the unsubmitted staff don’t have many 4* “world leading” outputs (whatever that means), honourable exceptions aside. So the correction for intensity to compare institutions by percentages of outputs assessed as 4* is probably reasonable. Here it is:

Another case where an intensity correction is probably reasonable is when comparing institutions by some combination of their 4* and 3* scores overall (i.e. including Impact and Environment): this time, because the 4* and (probably to a lesser extent) 3* scores will probably be one of the inputs to QR funding (if that carries on at all), and the intensity correction will scale the funding from a per-submitted-capita to a per-eligible-capita basis: it will measure the average QR funding received per member of staff. As I say, we don’t know about QR funding in the future; using the current weighting (3×4*+1×3*), the picture looks like this:

But wait, does it make sense to rank these at all? Surely what matters here is some kind of absolute scale: small differences between per-capita QR funding at different institutions will be hardly noticeable, and even moderate ones are unlikely to be game-changers. (Of course, institutions may choose not to distribute any QR funding they receive evenly; such institutions could be regarded as having a less inclusive research culture...). So, if instead of plotting QR ranks, we plot absolute values of the QR-related “score”, what does the picture look like?

This picture might be reassuringly familiar to the UK-based research academic: there’s a reasonable clump of the names that one might expect to be characterized as “research-intensive” institutions, between around 89 and 122 on the right-hand (intensity-corrected) scale; some are above that clump (the “winners”: UCL, Cambridge, Kings London) and some a bit below (SOAS). Of course, since the members of that clump are broadly predictable just from reputation, one might ask whether the immense cost of the REF process delivers any particular benefit. (One might ask. It’s not clear that one would ever get an answer).

I should stop playing with slopegraphs and resume work on packaging up the data so that other people can also write about REF2014 instead of enjoying their “holidays”.

Syndicated 2014-12-23 15:13:49 from notes

23 Dec 2014 »

research excellence framework and public relations

Last week saw the publication of the results of the “Research Excellence Framework” – a title which does almost nothing to describe what it actually is. To those reading not from academia, or from those portions of academia where this monstrosity has not penetrated, the REF involved university departments gathering sets of up to four publications from a subset of their research-active staff who were employed by them on 1st December 2013; writing submission documents to explain why those publications should be considered world-leading or at least internationally excellent; gathering and documenting “Impact” case studies, and writing about the general environment for performing research. These submissions then went to assessment panels, who did some unknowable things with them before producing assessments of the submissions: for each of the three measures (outputs, impact and environment), the percentages of that submission judged to be at particular levels, measured from 4* (world-leading) down to 0* (unclassified), and these assessments will affect in some as yet undetermined way Quality-Related funding for Higher Education Institutions (if QR funding continues at all). Those data are now published, along with, for each department, the count of full-time-equivalent research staff considered as part of the submission, and (inexplicably, about eight hours later) figures from a different higher-education agency estimating the number of staff who would have been eligible to be considered as part of each submission.

If your first reaction to all of this is “wow, that’s a well-thought-through system that will in no way become subject to Goodheart’s law”, well, I envy you your worldview. Gaming the REF starts early: since it is the employment status on 1st December 2013 that matters for the purpose of this exercise, rather than the location at which any given piece of research was done (or the address for correspondence given on the published research), economic forces quite naturally set up a transfer window, where researchers likely to score highly in the REF if included in a submission are able to name high prices for moving in the six months leading up to the census date – and there’s then much less flexibility in the academic labour market for a good year or two afterwards. Other employment distortions happen too; for example, there’s an incentive to approach researchers based outside the UK, and offer to pay them a fractional wage (say 0.2FTE) in exchange for them offering up four of their publications, assumed to be high-scoring, for submission to the UK REF. Given the way the QR funding is distributed, this is in effect a subsidy to departments with well-connected heads, and to already-successful overseas researchers.

But aside from the distortions that arise from producing the submissions, and the simultaneously unclear and opaque way that judgment on those submissions is made, at least the result of all of this is a table of numbers, which don’t lie, right? Right.

No.

I happened to be in a different country on REF results day, doing different work. So my main view of what was going on was following the #REF2014 twitter stream. And maybe I envy myself my previous worldview, because I was not prepared for the deluge of mendaciousness. I don’t know what I was expecting – a discussion between academics, maybe, or a quick dissection of the figures, and some links to news articles and blog posts. What I actually got was corporate account after account claiming victory in the REF, using various ways of weighting and ordering the measures. Particularly egregious examples included:

failing to distinguish between case studies and overall research, typically seen in “x% of our research has world-class impact” or similar: @unisouthampton @cardiffphilos @ITSLeeds @UWEGradSchool @LancasterManage
talking about “research power”, which multiplies a GPA-type score by the FTE quantity of staff submitted to the assessment. By introducing this, larger institutions can successfully confound a notional measure of quality with a measure of quantity to produce something essentially meaningless (except that it will likely be some similar formula which determines QR funding – but most University costs are per-capita costs anyway, so this still doesn’t make much sense): @LawLeicester @CityUniHealth @UniofReading @EconomicsatYork @UoNresearch @ScienceLeeds
simple gibberish: @CovUniResearch, and the Guardian gets a special prize for attempting to use the REF to compare apples to oranges.

It was also disappointing to see corporate twitter accounts attempting to find measures to optimize their ranking positions after the fact; John O’Leary has observed that at least four institutions have claimed overall “victory”, as if one institution’s research could defeat another’s. As well as overall victory, though, a seemingly-infinite number of departmental accounts claimed victory, or a top-ten position, or a top-twenty (or 16th, or 22nd, or 66th) as if that was meaningful. Do we really believe that the panels, in all their wisdom, can assess the difference between “internationally excellent” and “world-leading” to a sufficient accuracy that differences in GPA scores in the third significant place are meaningful? In other words, where are the error bars on these measurements?

To finish up: I spent too long today downloading the HEFCE and HESA data, cleaning it up, swearing at UCL’s acquisition of the Institute of Education, and messing around with ggplot. I hope to publish the cleaned-up data files for others to play with in the holidays; in the meantime, I leave you with this illustration of the game-playing: since there are multiple outcomes from one set of REF measurements, different institutions will have used different strategies for choosing which of their staff to submit and which not: to attempt to optimize their Grade Point Average, their (hoped-for) QR funding, or their likelihood of a good headline on 18th December 2014. We can measure some of that by looking at the difference between GPA scores, and GPA scores scaled by the proportion of eligible staff who were actually submitted. To illustrate that, I’ve made a Tufte-style slopegraph; the only concession to modernity is that the steeper the slope, the darker the ink of the line. (Modernity in character encoding – sorry, Glyndŵr University – and font-antialiasing is completely unaddressed).

You can decide whether either of the GPA measures is more or less meaningful; I have no particular axe to grind (though I suspect that my institution might, and on balance I think they are on the side of the angels who know how to dance slightly better on the head of a pin). One message this graph tells that everyone should be able to agree on is that it illustrates different strategies being employed – if there were uniformity across the sector, the lines would be generally horizontal. (And the real tragedy of all this is that clever, dedicated people at institutions of research and learning spent actual time and energy thinking about REF submission strategies, instead of doing something interesting and potentially useful).

In some sense, I hope not to come back to this subject. But it’s the holidays, and I need something that seems enough like work avoidance that it will distract me from preparing my lectures for next term...

Syndicated 2014-12-22 23:14:03 (Updated 2014-12-22 23:17:31) from notes

17 Nov 2014 (updated 17 Nov 2014 at 13:14 UTC) »

hearing wagner data preparations

Last week’s activity – in between the paperwork, the teaching, the paperwork, the paperwork, the teaching and the paperwork – was mostly taken up in preparations for the Hearing Wagner event, part of the AHRC’s Being Human festival.

Being a part of the Being Human festival gave us the opportunity to work to collect data that we wouldn’t otherwise have had access to: because of the fortuitous timing of the Mariinsky Theatre’s production of the Ring at the Birmingham Hippodrome between 5th and 9th November, we were able to convince funders to allow us to offer free tickets to Birmingham Conservatoire students, in exchange for being wired up to equipment measuring their electrodermal activity, blood flow, and hand motion.

Why collect these data? Well, on of the themes of the Transforming Musicology project as a whole is to examine the perception of leitmotive, particularly Wagner’s use of them in the Ring, and the idea behind gathering these data is to have ecologically-valid (in as much as that is possible when there’s a device strapped to you) measurements of participants’ physical responses to the performance, where those physical responses are believed to correlate with emotional arousal. Using those measurements, we can then go looking for signals of responses to leitmotives, or to other musical or production cues: as well as the students attending the performance, some of the research team were present backstage, noting down the times of events in the staging of (subjective) particular significance – lighting changes, for example.

And then all of these data come back to base, and we have to go through the process of looking for signal. And before we can do anything else, we have to make sure that all of our data are aligned to a reference timeline. For each of the operas, we ended up with around 2GB of data files: up to 10 sets of data from the individual participants, sampled at 120Hz or so; times of page turns in the vocal score, noted by a musicologist member of the research team (a coarse approximation to the sound experienced by the participants); timestamped performance annotations, generated by a second musicologist and dramaturge. How to get all of this onto a common timeline?

Well, in the best of all possible worlds, all of the clocks in the system would have been synchronized by ntp, and that synchronization would have been stable and constant throughout the process. In this case, the Panglossians would have been disappointed: in fact none of the various devices was sufficiently stably synchronized with any of the others to be able to get away with no alignment.

Fortunately, the experimental design was carried out by people with a healthy amount of paranoia: the participants were twice asked to clap in unison: once in the backstage area, where there was only speed-of-sound latency to the listeners (effectively negligible), and once when seated in the auditorium, where there was additional latency from the audio feed from the auditorium to backstage. Those claps gave us enough information, on the rather strong assumption that they were actually simultaneous, to tie everything together: the first clap could be found on each individual measuring device by looking at the accelerometer data for the signature, which establishes a common timeline for the measurement data and the musicologists; the second clap gives a measure for the additional latency introduced by the audio feed. Since the participants’ claps weren’t actually simultaneous – despite the participants being music students, and the clap being conducted – we have a small error, but it’s likely to be no more than about one second.

And this week? This week we’ll actually be looking for interesting signal; there’s reason to believe that electrodermal activity (basically, the change in skin conductance due to sweat) is indicative of emotional arousal, and quite a sensitive measure of music-induced emotion. This is by its nature an exploratory study: at least to start with, we’re looking at particular points of interest (specified by musicologists, in advance) for any correlation with biosignal response – and we’ll be presenting initial results about anything we find at the Hearing Wagner event in Birmingham this weekend. The clock is ticking...

edit: see also a similar post on the project blog

Syndicated 2014-11-17 10:58:17 (Updated 2014-11-17 12:49:11) from notes

10 Nov 2014 »

reproducible builds - a month ahead of schedule

I think this might be my last blog entry on the subject of building SBCL for a while.

One of the premises behind SBCL as a separate entity from CMUCL, its parent, was to make the result of its build be independent of the compiler used to build it. To a world where separate compilation is the norm, the very idea that building some software should persistently modify the state of the compiler probably seems bizarre, but the Lisp world evolved in that way and Lisp environments (at least those written in themselves) developed build recipes where the steps to construct a new Lisp system from an old one and the source code would depend critically on internal details of both the old and the new one: substantial amounts of introspection on the build host were used to bootstrap the target, so if the details revealed by introspection were no longer valid for the new system, there would need to be some patching in the middle of the build process. (How would you know whether that was necessary? Typically, because the build would fail with a more-or-less – usually more – cryptic error.)

Enter SBCL, whose strategy is essentially to use the source files first to build an SBCL!Compiler running in a host Common Lisp implementation, and then to use that SBCL!Compiler to compile the source files again to produce the target system. This requires some contortions in the source files: we must write enough of the system in portable Common Lisp so that an arbitrary host can execute SBCL!Compiler to compile SBCL-flavoured sources (including the standard headache-inducing (defun car (list) (car list)) and similar, which works because SBCL!Compiler knows how to compile calls to car).

How much is “enough” of the system? Well, one answer might be when the build output actually works, at least to the point of running and executing some Lisp code. We got there about twelve years ago, when OpenMCL (as it was then called) compiled SBCL. And yet... how do we know there aren't odd differences that depend on the host compiler lurking, which will not obviously affect normal operation but will cause hard-to-debug trouble later? (In fact there were plenty of those, popping up at inopportune moments).

I’ve been working intermittently on dealing with this, by attempting to make the Common Lisp code that SBCL!Compiler is written in sufficiently portable that executing it on different implementations generates bitwise-identical output. Because then, and only then, can we be confident that we are not depending in some unforseen way on a particular implementation-specific detail; if output files are different, it might be a harmless divergence, for example a difference in ordering of steps where neither depends on the other, or it might in fact indicate a leak from the host environment into the target. Before this latest attack at the problem, I last worked on it seriously in 2009, getting most of the way there but with some problems remaining, as measured by the number of output files (out of some 330 or so) whose contents differed depending on which host Common Lisp implementation SBCL!Compiler was running on.

Over the last month, then, I have been slowly solving these problems, one by one. This has involved refining what is probably my second most useless skill, working out what SBCL fasl files are doing by looking at their contents in a text editor, and from that intuiting the differences in the implementations that give rise to the differences in the output files. The final pieces of the puzzle fell into place earlier this week, and the triumphant commit announces that as of Wednesday all 335 target source files get compiled identically by SBCL!Compiler, whether that is running under Clozure Common Lisp (32- or 64-bit versions), CLISP, or a different version of SBCL itself.

Oh but wait. There is another component to the build: as well as SBCL!Compiler, we have SBCL!Loader, which is responsible for taking those 335 output files and constructing from them a Lisp image file which the platform executable can use to start a Lisp session. (SBCL!Loader is maybe better known as “genesis”; but it is to load what SBCL!Compiler is to compile-file). And it was slightly disheartening to find that despite having 335 identical output files, the resulting cold-sbcl.core file differed between builds on different host compilers, even after I had remembered to discount the build fingerprint constructed to be different for every build.

Fortunately, the actual problem that needed fixing was relatively small: a call to maphash, which (understandably) makes no guarantees about ordering, was used to affect the Lisp image data directly. I then spent a certain amount of time being thoroughly confused, having managed to construct for myself a Lisp image where the following forms executed with ... odd results:

  (loop for x being the external-symbols of "CL" count 1)
; => 1032
(length (delete-duplicates (loop for x being the external-symbols of "CL" collect x)))
; => 978

(shades of times gone by). Eventually I realised that

  (unless (member (package-name package) '("COMMON-LISP" "KEYWORD" :test #'string=))
  ...)

was not the same as

  (unless (member (package-name package) '("COMMON-LISP" "KEYWORD") :test #'string=)
  ...)

and all was well again, and as of this commit the cold-sbcl.core output file is identical no matter the build host.

It might be interesting to survey the various implementation-specific behaviours that we have run into during the process of making this build completely repeatable. The following is a probably non-exhaustive list – it has been twelve years, after all – but maybe is some food for thought, or (if you have a particularly demonic turn of mind) an ingredients list for a maximally-irritating CL implementation...

Perhaps most obviously, various constants are implementation-defined. The ones which caused the most trouble were undoubtably most-positive-fixnum and most-negative-fixnum – particularly since they could end up being used in ways where their presence wasn’t obvious. For example, (deftype fd () `(integer 0 ,most-positive-fixnum)) has, in the SBCL build process, a subtly different meaning from (deftype fd () (and fixnum unsigned-byte)) – in the second case, the fd type will have the intended meaning in the target system, using the target’s fixnum range, while in the first case we have no way of intercepting or translating the host’s value of most-positive-fixnum. Special mentions go to array-dimension-limit, which caused Bill Newman to be cross on the Internet, and to internal-time-units-per-second; I ended up tracking down one difference in output machine code from a leak of the host’s value of that constant into target code.
Similarly, random and sxhash quite justifiably differ between implementations. The practical upshot of that is that these functions can’t be used to implement a cache in SBCL!Compiler, because the access patterns and hence the patterns of cache hits and misses will be different depending on the host implementation.
As I’ve already mentioned, maphash does not iterate over hash-table contents in a specified order, and in fact that order need not be deterministic; similarly, with-package-iterator can generate symbols in arbitrary orders, and set operations (intersection, set-difference and friends) will return the set as a list whose elements are in an arbitrary order. Incautious use of these functions tended to give rise to harmless but sometimes hard-to-diagnose differences in output; the solution was typically to sort the iteration output before operating on any of it, to introduce determinism...
... but it was possible to get that wrong in a harder-to-detect way, because sort isnot specified to be stable. In some implementations, it actually is a stable sort in some conditions, but for cases where it’s important to preserve an already-existing partial order, stable-sort is the tool for the job.
The language specification explicitly says that the initial contents of uninitialized arrays are undefined. In most implementations, at most times, executing (make-array 8 :element-type (unsigned-byte 8)) will give a zero-filled array, but there are circumstances in some implementations where the returned array will have arbitrary data.
Not only are some constants implementation-defined, but so also are the effects of normal operation on some variables. *gensym-counter* is affected by macroexpansion if the macro function calls gensym, and implementations are permitted to macroexpand macros an arbitrary number of times. That means that our use of gensym needs to be immune to whatever the host implementation’s macroexpansion and evaluation strategy is.
The object returned by byte to represent a bitfield with size and position is implementation-defined. Implementations (variously) return bitmasks, conses, structures, vectors; host return values of byte must not be used during the execution of SBCL!Compiler. More subtly, the various boole-related constants (boole-and and friends) also need special treatment; at one point, their host values were used when SBCL!Compiler compiled the boole function itself, and it so happens that CLISP and SBCL both represent the constants as integers between 0 and 15... but with a different mapping between operation and integer.
my last blog entry talked about constant coalescing, and about printing of (quote foo). In fact printing in general has been a pain, and there are still significant differences in interpretation or at least in implementation of pretty-printing: to the extent that at one point we had to minimize printing at all in order for the build to complete under some implementations.
there are a number of things which are implementation-defined but have caused a certain amount of difficulty. Floating point in general is a problem, not completely solved (SBCL will not build correctly if its host doesn’t have distinct single- and double-float types that are at least approximately IEEE754-compliant). Some implementations lack denormalized numbers; some do not expose signed zeros to the user; and some implementations compute (log 2d0 10d0) more accurately than others, including SBCL itself, do. The behaviour of the host implementation on legal but dubious code is also potentially tricky: SBCL’s build treats full warnings as worthy of stopping, but some hosts emit full warnings for constructs that are tricky to write in other ways: for example to write portable code to handle multiple kinds of string, one might write (typecase string (simple-base-string ...) ((simple-array character (*)) ...)) (string ...)) but some implementations emit full warnings if a clause in a typecase is completely shadowed by other clauses, and if base-char and character are identical in that implementation the typecase above will signal.

There were probably other, more minor differences between implementations, but the above list gives a flavour of the things that needed doing in order to get to this point, where we have some assurance that our code is behaving as intended. And all of this is a month ahead of my self-imposed deadline of SBCL’s 15th birthday: SBCL was announced to the world on December 14th, 1999. (I’m hoping to be able to put on an sbcl15 workshop in conjunction with the European Lisp Symposium around April 20th/21st/22nd – if that sounds interesting, please pencil the dates in the diary and let me know...)

Syndicated 2014-11-08 22:38:14 (Updated 2014-11-10 17:00:04) from notes

14 Oct 2014 »

still working on reproducible builds

It’s been nearly fifteen years, and SBCL still can’t be reliably built by other Lisp compilers.

Of course, other peoples’ definition of “reliably” might differ. We did achieve successful building under unrelated Lisp compilers twelve years ago; there were a couple of nasty bugs along the way, found both before and after that triumphant announcement, but at least with a set of compilers whose interpretation of the standard was sufficiently similar to SBCL’s own, and with certain non-mandated but expected features (such as the type (array (unsigned-byte 8) (*)) being distinct from simple-vector, and single-float being distinct from double-float), SBCL achieved its aim of being buildable on a system without an SBCL binary installed (indeed, using CLISP or XCL as a build host, SBCL could in theory be bootstrapped starting with only gcc).

For true “reliability”, though, we should not be depending on any particular implementation-defined features other than ones we actually require – or if we are, then the presence or absence of them should not cause a visible difference in the resulting SBCL. The most common kind of leak from the host lisp to the SBCL binary was the host’s value of most-positive-fixnum influencing the target, causing problems from documentation errors all the way up to type errors in the assembler. Those leaks were mostly plugged a while ago, though they do recur every so often; there are other problems, and over the last week I spent some time tracking down three of them.

The first: if you’ve ever done (apropos "PRINT") or something similar at the SBCL prompt, you might wonder at the existence of functions named something like SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX (QUOTE (102))) (OP1 (QUOTE (58))) (OP2 (QUOTE (32))) (IMM NIL TYPE (QUOTE IMM-BYTE))) (QUOTE (NAME TAB REG , REG/MEM ...)))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|.

What is going on there? Well, these functions are a part of the disassembler machinery; they are responsible for taking a certain amount of the machine code stream and generating a printed representation of the corresponding assembly: in this case, for the PINSRB instruction. Ah, but (in most instruction sets) related instructions share a fair amount of structure, and decoding and printing a PINSRD instruction is basically the same as for PINSRB, with just one #x20 changed to a #x22 – in both cases we want the name of the instruction, then a tab, then the destination register, a comma, the source, another comma, and the offset in the destination register. So SBCL arranges to reuse the PINSRB instruction printer for PINSRD; it maintains a cache of printer functions, looked up by printer specification, and reuses them when appropriate. So far, so normal; the ugly name above is the generated name for such a function, constructed by interning a printed, string representation of some useful information.

Hm, but wait. See those (QUOTE (58)) fragments inside the name? They result from printing the list (quote (58)). Is there a consensus on how to print that list? Note that *print-pretty* is bound to nil for this printing; prior experience has shown that there are strong divergences between implementations, as well as long-standing individual bugs, in pretty-printer support. So, what happens if I do (write-to-string '(quote foo) :pretty nil)?

SBCL: "(QUOTE FOO)", unconditionally
CCL: "'FOO" by default; "(QUOTE FOO)" if ccl:*print-abbreviate-quote* is set to nil
CLISP: "'FOO", unconditionally (I read the .d code with comments in half-German to establish this)

So, if SBCL was compiled using CLISP, the name of the same function in the final image would be SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX '(102)) (OP1 '(58)) (OP2 '(32)) (IMM NIL TYPE 'IMM-BYTE)) '(NAME TAB REG , REG/MEM ...))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|. Which is shorter, and maybe marginally easier to read, but importantly for my purposes is not bitwise-identical.

Thus, here we have a difference between host Common Lisp compilers which leaks over into the final image, and it must be eliminated. Fortunately, this was fairly straightforward to eliminate; those names are never in fact used to find the function object, so generating a unique name for functions based on a counter makes the generated object file bitwise identical, no matter how the implementation prints two-element lists beginning with quote.

The second host leak is also related to quote, and to our old friend backquote – though not related in any way to the new implementation. Consider this apparently innocuous fragment, which is a simplified version of some code to implement the :type option to defstruct:

  (macrolet ((def (name type n)
             `(progn
                (declaim (inline ,name (setf ,name)))
                (defun ,name (thing)
                  (declare (type simple-vector thing))
                  (the ,type (elt thing ,n)))
                (defun (setf ,name) (value thing)
                  (declare (type simple-vector thing))
                  (declare (type ,type value))
                  (setf (elt thing ,n) value)))))
  (def foo fixnum 0)
  (def bar string 1))

What’s the problem here? Well, the functions are declaimed to be inline, so SBCL records their source code. Their source code is generated by a macroexpander, and so is made up of conses that are generated programmatically (as opposed to freshly consed by the reader). That source code is then stored as a literal object in an object file, which means in practice that instructions for reconstructing a similar object are dumped, to be executed when the object file is processed by load.

Backquote is a reader macro that expands into code that, when evaluated, generates list structure with appropriate evaluation and splicing of unquoted fragments. What does this mean in practice? Well, one reasonable implementation of reading `(type ,type value) might be:

  (cons 'type (cons type '(value)))

and indeed you might (no guarantees) see something like that if you do

  (macroexpand '`(type ,type value))

in the implementation of your choice. Similarly, reading `(setf (elt thing ,n) value) will eventually generate code like

  (cons 'setf (cons (cons 'elt (list 'thing n)) '(value)))

Now, what is “similar”? In this context, it has a technical definition: it relates two objects in possibly-unrelated Lisp images, such that they can be considered to be equivalent despite the fact that they can’t be compared:

similar adj. (of two objects) defined to be equivalent under the similarity relationship.

similarity n. a two-place conceptual equivalence predicate, which is independent of the Lisp image so that two objects in different Lisp images can be understood to be equivalent under this predicate. See Section 3.2.4 (Literal Objects in Compiled Files).

Following that link, we discover that similarity for conses is defined in the obvious way:

Two conses, S and C, are similar if the car of S is similar to the car of C, and the cdr of S is similar to the cdr of C.

and also that implementations have some obligations:

Objects containing circular references can be externalizable objects. The file compiler is required to preserve eqlness of substructures within a file.

and some freedom:

With the exception of symbols and packages, any two literal objects in code being processed by the file compiler may be coalesced if and only if they are similar [...]

Put this all together, and what do we have? That def macro above generates code with similar literal objects: there are two instances of '(value) in it. A host compiler may, or may not, choose to coalesce those two literal '(value)s into a single literal object; if it does, the inline expansion of foo (and bar) will have a circular reference, which must be preserved, showing up as a difference in the object files produced during the SBCL build. The fix? It’s ugly, but portable: since we can’t stop an aggressive compiler from coalescing constants which are similar but not identical, we must make sure that any similar substructure is in fact identical:

  (macrolet ((def (name type n)
             (let ((value '(value)))
               `(progn
                  (declaim (inline ,name (setf ,name)))
                  (defun ,name (thing)
                    (declare (type simple-vector thing))
                    (the ,type (elt thing ,n)))
                  (defun (setf ,name) (value thing)
                    (declare (type simple-vector thing))
                    (declare (type ,type . ,value))
                    (setf (elt thing ,n) . ,value)))))
  (def foo fixnum 0)
  (def bar string 1))

Having dealt with a problem with quote, and a problem with backquote, what might the Universe serve up for my third problem? Naturally, it would be a problem with a code walker. This code walker is somewhat naïve, assuming as it does that its body is made up of forms or tags; it is the assemble macro, which is used implicitly in the definition of VOPs (reusable assembly units); for example, like

  (assemble ()
  (move ptr object)
  (zeroize count)
  (inst cmp ptr nil-value)
  (inst jmp :e DONE)
 LOOP
  (loadw ptr ptr cons-cdr-slot list-pointer-lowtag)
  (inst add count (fixnumize 1))
  (inst cmp ptr nil-value)
  (inst jmp :e DONE)
  (%test-lowtag ptr LOOP nil list-pointer-lowtag)
  (error-call vop 'object-not-list-error ptr)
 DONE))

which generates code to compute the length of a list. The expander for assemble scans its body for any atoms, and generates binding forms for those atoms to labels:

  (let ((new-labels (append labels
                          (set-difference visible-labels inherited-labels))))
  ...
  `(let (,@(mapcar (lambda (name) `(,name (gen-label))) new-labels))
     ...))

The problem with this, from a reproducibility point of view, is that set-difference (and the other set-related functions: union, intersection, set-exclusive-or and their n-destructive variants) do not return the sets with a specified order – which is fine when the objects are truly treated as sets, but in this case the LOOP and DONE label objects ended up in different stack locations depending on the order of their binding. Consequently the machine code for the function emitting code for computing a list’s length – though not the machine code emitted by that function – would vary depending on the host’s implementation of set-difference. The fix here was to sort the result of the set operations, knowing that all the labels would be symbols and that they could be treated as string designators.

And after all this is? We’re still not quite there: there are three to four files (out of 330 or so) which are not bitwise-identical for differing host compilers. I hope to be able to rectify this situation in time for SBCL’s 15th birthday...

Syndicated 2014-10-14 06:51:19 from notes

199 older entries...