Older blog entries for crhodes (starting at number 181)

why investigate databases of audio?

At the launch meeting for Transforming Musicology (I should perhaps say that technically this was in the pub after the launch meeting), Laurence Dreyfus asked me why I was experimenting with a collection of recordings when planning my Josquin/Gombert case study, as opposed to the perhaps more natural approach of investigating the motivic or thematic similarity of music starting with the notated works (manuscripts, contemporary or modern editions).

By this stage in the evening, I was not feeling at my best – a fact unrelated to the venue of the conversation, I hasten to add – and I wouldn't be surprised if my answer was not a model of clarity. Here's another go, then, beginning with an anecdote.

When I was a PhD student, a long, long time ago in a discipline far, far, away, the research group I was in had a weekly paper-reading club, where students and postdocs, and the occasional permanent member of staff, would gather together over lunch and try to work and think through an article in the field. Usually, the paper selected was roughly contemporary, maybe published in the last few years, and therefore (because of the publishing norms in Applied Mathematics and Theoretical Physics) the paper would be available on the arXiv, and so could be acquired, and printed, without leaving one's desk. (Picking up the printed copy could double up as one's daily exercise quota.) One week, however, some imp put down a paper that predated the arXiv's rise to prominence. I have two clear memories, which should nevertheless be treated as unreliable: one of myself, spending about an hour of elapsed time in the mathematical library, registering, finding the journal, finding the volume and photocopying the article; the other of the lunch meeting, discovering that those of us having seen copies of the article (let's not go as far as having read the article) being definitively in the minority.

There are several lessons to be drawn from this, beyond further evidence of the laziness and hubris of the typical PhD student. The availability of information, in whatever form, makes it more likely to be useful and more likely to be used. It's timely to reflect on the case of Aaron Swartz, who confronted this issue and faced disproportionate consequences; I'm pleased to say that I am involved in my institution's working group on Open Access, as we try to find a way of simultaneously satisfying all the requirements imposed on us as well as extending our ability to offer a transformative experience.

How does this affect musicology, and our particular attempt to give it the transformative experience? Well, one issue with music in general is its lack of availability.

That is of course a ridiculous statement, designed to be provocative: in this era of downloadable music, streaming music, ripped music, music in your pocket and on your phone, music has never been more available. But the music available in all those forms is audio; the story for other forms of musical data (manuscript notation, chord sequences, performing editions, even lyrics) is far less good. Sometimes that is for copyright reasons, as rights-holders attempt to limit the distribution of their artifacts; sometimes that is for economic reasons, as the resources to digitize and encode public-domain sources of notation are unavailable. But the net effect is that while musical audio is limited and expensive, it is often not possible to purchase large amounts of musical data in other forms at any price.

So, in my still-hypothetical case study of Josquin/Gombert attribution, I could use notation as my primary source of data if I had encodings of sources or editions of the pieces I'm interested in – the needles, if you will; if I had those, I could verify that some computational algorithm could identify the association between sections of the chansons, Lugebat and Credo. That encoding task would be tedious, but achievable in – let's say – a week or so for the three pieces. The problem with that is that that is only half the battle; it is the construction of the needle, but for a method like this to be properly tested it's not fair just to let it loose on the needle; a search system has to be given the chance to fail by hiding the needle in a suitable haystack. And constructing a haystack – encoding a few hundreds of other Renaissance vocal works – is the thing that makes it less than practical to work on search within databases of notation, at least for this long-tail repertoire.

And that's why at the moment I investigate databases of musical audio: I can get the data, and in principle at least so can other people. If you can do it better, or just differently, why not consider applying for one of the Transforming Musicology mini-projects? There will be a one-day exploratory event at Lancaster University on the 12th February to explore possible project ideas and collaborations, and having as wide a pool of interested parties as possible is crucial for the production and execution of interesting ideas. Please come!

Syndicated 2014-01-13 20:35:49 from notes

11 Jan 2014 (updated 13 Jan 2014 at 21:11 UTC) »

slime has moved to github

This is probably not news to anyone reading this, but: SLIME has moved its primary source repository to github.

One of my projects, swankr (essentially, a SLIME for R, reusing most of the existing communication protocol and implementing a SWANK backend in R, hence the name), obviously depends on SLIME: if nothing else, the existing instructions for getting swankr up and running included a suggestion to get the SLIME sources from common-lisp.net CVS, which as Zach says is a problem that it's nice no longer to have. (Incidentally, Zach is running a surveydirect link – to assess possible effects of changes on the SLIME userbase.)

So, time to update the instructions, and in the process also

  • hoover up any obvious patches from my inbox;
  • improve the startup configuration so that the user need do no explicit configuration beyond a suitable entry in slime-lisp-implementations;
  • update the BUGS.org and TODO.org files for the current reality.

That's all safely checked in and pushed to the many various locations which people might think of as the primary swankr repository. But those org-mode files are exported to HTML to produce the minimal swankr web presence, and the script that does that was last run with org-mode version 7.x, while now, in the future, org-mode is up to version 8 (and this time the change in major version number is definitely indicative of non-backwards-compatible API changes).

This has already bitten me. The principles of reproducible research are good, and org-mode offers many of the features that help: easily-edited source format, integration with a polyglossia of programming languages, high-quality PDF output (and adequate-quality HTML output, from the same sources) – I've written some technical reports in this style, and been generally happy with it. But, with the changes from org-mode 7 to org-mode 8, in order to reproduce my existing documents I needed to make not only (minor) source document changes, but also (quite major) changes to the bits of elisp to generate exported documents in house style. Reproducible research is difficult; I suppose it's obvious that exact reproduction depends on exact software versions, platform configurations and so on – see this paper, for example – but how many authors of supposedly reproducible research are actually listing all the programs, up to exact versions of dynamically-linked libraries they link to, used in the preparation and treatment of the data and document?

In any case, the changes needed for swankr are pretty minor, though I was briefly confused when I tried org-html-export-as-html first (and no output was produced, because that's the new name for exporting to an HTML buffer rather than an HTML file. The swankr website should now reflect the current state of affairs.

Syndicated 2014-01-11 20:24:36 (Updated 2014-01-13 20:35:49) from notes

11 Jan 2014 (updated 13 Jan 2014 at 21:11 UTC) »

the dangers of writing less

A year or so ago, I wrote a couple of blog entries on precompiling discriminating functions for generic functions, both given mostly class-based dispatch and also when most of the methods were specialized on individual objects (i.e. methods with specializers of class eql-specializer).

I signed off the second entry with

Next up, unless I've made further oversights which need correction: automating this process, and some incidental thoughts.

and of course never got back to this topic. Now it's over a year later, and I can no longer remember either my incidental thoughts, which I'm sure were fascinating, nor my strategy for automating this (because this was an attempt to address an actual user need, in – if I remember correctly – a CL implementation of protobuf). Suggestions for what I might have been thinking at the time gratefully received.

Write more: you know it makes sense.

Syndicated 2014-01-11 13:57:33 (Updated 2014-01-13 20:35:49) from notes

transforming musicology launch meeting

Our AHRC-funded project, Transforming Musicology, has its first all-hands meeting today.

While it may seem odd that we are three months into the funded period of the project and we haven’t all met yet – and it is a bit odd, really – it was also pretty much unavoidable: the confirmation of funding only came through in August, at which point it is difficult to get things up and running in a University in time to advertise a PhD studentship, deal with application enquiries, shortlist, interview, appoint a good candidate and deal with all the associated paperwork in time for September enrolment. The timing meant that there was an inevitable built-in three-month lag before some key project personnel were available (the next PhD enrolment date is now, in January) so it made sense to delay the launch meeting until this point, too. Not to mention that many of the project partners are high-powered professors with many diary commitments; getting them all in one place on one day is a triumph of organization in itself, for which Richard deserves much credit.

I spent some time yesterday trying to cook up a demo of audioDB, an old, unlamented (but a bit lamentable in its current state) tool from an earlier project. I’m at the stage of identifying the low-hanging fruit in audioDB that would need picking to make it a tool useful to musicologists; as well as the UI issues – it is definitely a tool currently optimized towards use on the command-line – there is the existential question of whether what it does could be useful in any way to musicologists in the first place. I think so, but at the moment it’s more handwaving and faith than firm knowledge.

The case study I’m working on is based on a question of attribution of some chansons and motets. The issue in question has largely been settled; I think everyone these days accepts that the eight-part Lugebat David Absalon is by Nicolas Gombert, not Josquin Des Prez, and that some other works (Tulerunt Dominum, Je prens congies and J’ay mis mon cueur) are also likely to be of the same musical hand: Martin Picker wrote on this in 2001, but recordings that I have of these works which predate that also agree on the attribution. (The works were originally attributed to Josquin based partly on their high quality; it has been said that for a while Josquin composed more and better works after he was dead than while he was alive...)

A nice demonstration, then, would be to reproduce the similarity relationships between recordings of the works discussed in Picker’s article, and show that those similarities are stronger than acoustic similarities that arise by chance. This isn’t going to change anyone’s mind either way on the Lugebat question, of course, but if it works it can give some confidence to musicologists that audioDB could be used to investigate collections of recordings for acoustically similar material without already knowing the answer.

Does it work? Sort-of-not-quite-yet; the chansons and the Credo do bubble up near the top of the search results, but without the clear delineation between them and other, less related hits. Yesterday’s investigations revealed the need for a couple of improvements: firstly, some finer-grained filtering of “interesting” regions, as otherwise the strongest matches between audio of this era tends to be strong final G-minor chords; secondly, there needs to be some work done on audio feature design, to improve the machine listener’s ear, because at the moment the feature I'm using does not capture pitch perception reliably enough. The good news is that addressing these things are in scope for the Transforming Musicology project, so there's some chance that they'll get done.

Syndicated 2014-01-07 09:46:47 from notes

more efficient hyperlinked blogging

How meta. To maintain my progress on my new year's resolution, I have written some code (*gasp!*) – yes, that counts as writing. And what does that code do? Why, it makes me more efficient at highly hyperlinked blogging: it is a certain amount of fairly trivial and mildly tedious elisp, which allows the easy insertion of markdown markup to create links to various authoritative sources of information about Lisp. Well, OK, to the Hyperspec and the MOP dictionary, but as an implementor that's all I really need, right? So, now I can talk about compute-effective-method or make-method-lambda and my eager readers can be taken to the relevant documentation at the speed of thought.

Questions that arose during the process:

  • why are all the fake packages in hyperspec.el created with (make-vector 67 0)?
  • has anyone in the 23 years since AMOP was published ever been glad that the MOP standardizes the extract-lambda-list and extract-specializer-names functions? (Fun fact: SBCL also has extract-parameters and extract-required-parameters functions, unexported and unused.)

Syndicated 2014-01-06 22:02:46 from notes

bugs all the way down

There are times when being in control of the whole software stack is a mixed blessing.

While doing investigations related to my previous post, I found myself wondering what the arguments and return values of make-method-lambda were in practice, in SBCL. So I did what any self-respecting Lisp programmer would do, and instead of following that link and decoding the description, I simply ran (trace sb-mop:make-method-lambda), and then ran my defmethod as normal. I was half-expecting it to break instantly, because the implementation of trace encapsulates named functions in a way that changes the class of the function object (essentially, it wraps the existing function in a new anonymous function; fine for ordinary functions, not so good for generic-function objects), and I was half-right: an odd error occurred, but after trace printed the information I wanted.

What was the odd error? Well, after successfully calling and returning from make-method-lambda, I got a no-applicable-method error while trying to compute the applicable methods for... make-method-lambda. Wait, what?

SBCL's CLOS has various optimizations in it; some of them have been documented in the SBCL Internals Manual, such as the clever things done to make slot-value fast, and specialized discriminating functions. There are plenty more that are more opaque to the modern user, one of which is the “fast method call” optimization. In that optimization, the normal calling convention for methods within method combination, which involves calling the method's method-function with two arguments – a list of the arguments passed to the generic function, and a list of next methods – is bypassed, with the fast-method-function instead being supplied with a permutation vector (for fast slot access) and next method call (for fast call-next-method) as the first two arguments and the generic function's original arguments as the remainder, unrolled.

In order for this optimization to be valid, the call-method calling convention must be the standard one – if the user is extending or overriding the method invocation protocol, all the optimizations based on assuming that the method invocation protocol might be invalid. We have to be conservative, so we need to turn this optimization off if we can't prove that it's valid – and the only case where we can prove that it's valid is if only the system-provided method on make-method-lambda has been called. But we can't communicate that after the fact; although make-method-lambda returns initargs as well as the lambda, an extending method could arbitrarily mess with the lambda while returning the initargs the system-provided method returns. So in order to find out whether the optimization is safe, we have to check whether exactly our system-provided method on make-method-lambda was the applicable one, so there's an explicit call to compute-applicable-methods of make-method-lambda after the method object has been created. And make-method-lambda being traced and hence not a generic-function any more, it's normal that there's an error. Hooray! Now we understand what is going on.

As for how to fix it, well, how about adding an encapsulations slot to generic-function objects, and handling the encapsulations in sb-mop:compute-discriminating-function? The encapsulation implementation as it currently stands is fairly horrible, abusing as it does special variables and chains of closures; there's a fair chance that encapsulating generic functions in this way will turn out a bit less horrible. So, modify sb-debug::encapsulate, C-c C-c, and package locks strike. In theory we are meant to be able to unlock and continue; in practice, that seems to be true for some package locks but not others. Specifically, the package lock from setting the fdefinition from a non-approved package gives a continuable error, but the ones from compiling special declarations of locked symbols have already taken effect and converted themselves to run-time errors. Curses. So, (mapcar #'unlock-package (list-all-packages)) and try again; then, it all goes well until adding the slot to the generic-function class (and I note in passing that many of the attributes that CL specifies are generic-function SBCL only gives to standard-generic-function objects), at which point my SLIME repl tells me that something has gone wrong, but not what, because no generic function works any more, including print-object. (This happens depressingly often while working on CLOS).

That means it's time for an SBCL rebuild, which is fine because it gives me time to write up this blog entry up to this point. Great, that finishes, and now we go onwards: implementing the functionality we need in compute-discriminating-function is a bit horrible, but this is only a proof-of-concept so we wrap it all up in a labels and stop worrying about 80-column conventions. Then we hit C-c C-c and belatedly remember that redefining methods involves removing them from their generic function and adding them again, and doing that to compute-discriminating-function is likely to have bad consequences. Sure enough:

  There is no applicable method for the generic function 
  #<STANDARD-GENERIC-FUNCTION COMPUTE-DISCRIMINATING-FUNCTION (1)>
when called with arguments
  (#<STANDARD-GENERIC-FUNCTION NO-APPLICABLE-METHOD (1)>).

Yes, well. One (shorter) rebuild of just CLOS later, and then a few more edit/build/test cycles, and we can trace generic functions without changing the identity of the fdefinition. Hooray! Wait, what was I intending to do with my evening?

Syndicated 2014-01-03 22:30:36 (Updated 2014-01-03 22:36:49) from notes

seeking real life uses for generalized specializers

Some time ago (call it half a decade or so), Jim Newton of Cadence and I did some work on extensible specializers: essentially coming up with a proof-of-concept protocol to allow users to define their own specializers with their own applicability and ordering semantics. That's a little bit vague; the concrete example we used in the writeup was a code walker which could warn about the use of unbound variables (and the non-use of bindings), and which implemented its handling of special forms with code of the form:

  (defmethod walk ((expr (cons (eql 'quote))) env call-stack)
  nil)
(defmethod walk ((var symbol) env call-stack)
  (let ((binding (find-binding env var)))
    (if binding
        (setf (used binding) t)
        (format t "~&unbound: ~A: ~A~%" var call-stack))))
(defmethod walk ((form (cons (eql 'lambda))) env call-stack)
  (destructuring-bind (lambda lambda-list &rest body) form
    (let* ((bindings (derive-bindings-from-ll lambda-list))
           (env* (make-env bindings env)))
      (dolist (form body)
        (walk form env* (cons form call-stack)))
      (dolist (binding bindings)
        (unless (used (cdr binding))
          (format t "~&unused: ~A: ~A~%" (car binding) call-stack))))))

The idea here is that it's possible to implement support in the walker for extra special forms in a modular way; while this doesn't matter very much in Common Lisp (which, famously, is not dead, just smells funny), in other languages which have made other tradeoffs in the volatility/extensibility space. And when I say “very much” I mean it: even SBCL allows extra special forms to be loaded at runtime; the sb-cltl2 module includes an implementation of compiler-let, which requires its own special handling in the codewalker which is used in the implementation of CLOS.

So modularity and extensibility is required in a code walker, even in Common Lisp implementations; in Cadence Skill++ it might even be generally useful (I don't know). In SBCL, the extensibility is provided using an explicit definer form; sb-cltl2 does

  (defun walk-compiler-let (form context env)
  #1=#<implementation elided>)
(sb-walker::define-walker-template compiler-let walk-compiler-let)

and that's not substantially different from

  (defmethod sb-walker:walk ((form (cons (eql 'compiler-let))) context env)
  #1#)

So far, so unexpected, for Lisp language extensions at least: of course the obvious test for a language extension is how many lines of code it can save when implementing another language extension. Where this might become interesting (and this, dear lazyweb, is where you come in) is if this kind of extension is relevant in application domains. Ideally, then, I'm looking for real-life examples of patterns of selecting ‘methods’ (they don't have to be expressed as Lisp methods, just distinct functions) based on attributes of objects, not just the objects' classes. The walker above satisfies these criteria: the objects under consideration are all of type symbol or cons, but the dispatch partly happens based on the car of the cons – but are there examples with less of the meta nature about them?

(I do have at least one example, which I will keep to myself for a little while: I will return to this in a future post, but for now I am interested in whether there's a variety of such things, and whether the generalization of specializer metaobjects is capable of handling cases I haven't thought of yet. Bonus points if the application requires multiple dispatch and/or non-standard method combination.)

Syndicated 2014-01-01 20:20:56 from notes

new year's resolution

So here we are: 31st December. Sitting here, with a film of dubious plot playing on the television, waiting for a sufficient approximation to the New Year to arrive, and thinking about the year just gone and the year that is yet to come, and making resolutions. And on looking back at 2013, one thing that I'm not happy about (while counting my many, many blessings) is the level of output: somewhat limited code, very limited publications – honestly, thanks to my PhD students Ray and Polina for any publications at all: they did the vast majority of the work on papers with my name listed as co-author. There are of course reasons for that, stemming both from personal and professional life, but it doesn't change my dissatisfaction with the end result: there's more that I can do, and more that I should be doing. And doing, as Yoda never said, leads to writing: if the act is sufficiently interesting, it should be natural to talk to people about it, to be enthusiastic about it, and to write about it.

I certainly feel that it's in my nature not to be enthusiastic. Friends, if I have any left, are likely to put my typical disposition somewhere near the cynical extreme – I like to think not without some provocation, but I can also acknowledge that an excess of cynicism can obscure those things that are in fact worth celebrating. I'm certainly not promising to be less cynical about those things which deserve it – many an academic will point at a number of things wrong with the current state of the profession, and I think we're not all harking back to a mislaid golden age when academics were free and academia was purely communitarian – but I think I need to act in positive ways as well as grind my teeth and bang my head against the wall when the negatives arise.

Baby steps first, though. The commitment: I will write more. I will write more about my ideas, my processes, my interactions and my achievements, and I'll do it publically where possible, as it's part of my self-image as an academic to educate by example. (And maybe at some point I will also make this website look pretty. If you're reading this through syndication, count yourself lucky.)

Syndicated 2013-12-31 21:49:30 from notes

some sbcl wiki entries

One of the potential advantages of this new wiki/blog setup is that I should be able to refer easily to my notes in my blog. It's true that the wiki-like content will change, whereas a mostly-static blog entry will naturally refer to the content of the wiki as it is at the point of blogging; in theory I should be able to find the content at the time that the link was made through a sufficiently advanced use of the version control system backing the ikiwiki installation, and in practice the readers of the blog (if any) are likely to follow (or not) any links within a short time-window of the initial publication of any given entry. So that shouldn't prove too confusing.

In order to test this, and to provide some slightly-relevant content for anyone still reading this on planet lisp: I've transferred my non-actionable notes from my sbcl org-mode file to an sbcl wiki section. This new setup gets heaps of bonus points if someone out there implements the development projects listed there as a result of reading this post, but even if not, it's a useful decluttering...

Syndicated 2013-12-30 22:05:24 from notes

29 Dec 2013 (updated 30 Dec 2013 at 11:20 UTC) »

new year new gpg key

I've been putting this off for long enough. My current PGP key was generated in a different age, with somewhat different threat models and substantially different computational power (let alone social and political power) available to those who might be interested in subverting the general ability of individuals to identify themselves and communicate in private. It's true that not much of what I do is likely to be of vast interest to security agents worldwide, but as SBCL release manager (on holiday for a month!) I do publish documents that purport to assure that the source code and binary that users can obtain are unmodified from my personal copies.

Therefore, I've created a new key, with 2013-era configuration settings: 2048-bit RSA encryption and signing, and SHA-1 hashes undesired (though still accepted, given the protocol requirement). I've created a transition document, signed with both keys, to help interested parties with the transition; while my older key will remain valid for a while (I have a stack of business cards with its fingerprint on to get through...) the new key should be preferred in all situations where that makes sense.

Observations:

  1. This is all quite fiddly and complicated. There's a reason (well, several reasons: all those business cards!) I have been putting this off for a while. I understand why at least some of the knobs and levers are there; I've taught Computer Security at undergraduate level – but I'm still not confident that I have got everything right, and that's worrying both personally and systemically.
  2. epa-sign-region appears to malfunction for me when given a prefix argument: what it's meant to do is give a UI for selecting keys to use for signing, and the option of which kind of signature to generate. What it actually seems to do is generate an undecryptable PGP Message block. [edit: this was caused by epa/epg version skew, from an old installation of the Debian easypg package. ]
  3. I stumbled over the --escape-from-lines GnuPG configuration option while reading the documentation. It's on by default.
  4. GnuPG is crowdfunding a new website and some infrastructure.

Syndicated 2013-12-29 22:04:59 (Updated 2013-12-30 10:16:26) from notes

172 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!