13 Jan 2014 crhodes   » (Master)

why investigate databases of audio?

At the launch meeting for Transforming Musicology (I should perhaps say that technically this was in the pub after the launch meeting), Laurence Dreyfus asked me why I was experimenting with a collection of recordings when planning my Josquin/Gombert case study, as opposed to the perhaps more natural approach of investigating the motivic or thematic similarity of music starting with the notated works (manuscripts, contemporary or modern editions).

By this stage in the evening, I was not feeling at my best – a fact unrelated to the venue of the conversation, I hasten to add – and I wouldn't be surprised if my answer was not a model of clarity. Here's another go, then, beginning with an anecdote.

When I was a PhD student, a long, long time ago in a discipline far, far, away, the research group I was in had a weekly paper-reading club, where students and postdocs, and the occasional permanent member of staff, would gather together over lunch and try to work and think through an article in the field. Usually, the paper selected was roughly contemporary, maybe published in the last few years, and therefore (because of the publishing norms in Applied Mathematics and Theoretical Physics) the paper would be available on the arXiv, and so could be acquired, and printed, without leaving one's desk. (Picking up the printed copy could double up as one's daily exercise quota.) One week, however, some imp put down a paper that predated the arXiv's rise to prominence. I have two clear memories, which should nevertheless be treated as unreliable: one of myself, spending about an hour of elapsed time in the mathematical library, registering, finding the journal, finding the volume and photocopying the article; the other of the lunch meeting, discovering that those of us having seen copies of the article (let's not go as far as having read the article) being definitively in the minority.

There are several lessons to be drawn from this, beyond further evidence of the laziness and hubris of the typical PhD student. The availability of information, in whatever form, makes it more likely to be useful and more likely to be used. It's timely to reflect on the case of Aaron Swartz, who confronted this issue and faced disproportionate consequences; I'm pleased to say that I am involved in my institution's working group on Open Access, as we try to find a way of simultaneously satisfying all the requirements imposed on us as well as extending our ability to offer a transformative experience.

How does this affect musicology, and our particular attempt to give it the transformative experience? Well, one issue with music in general is its lack of availability.

That is of course a ridiculous statement, designed to be provocative: in this era of downloadable music, streaming music, ripped music, music in your pocket and on your phone, music has never been more available. But the music available in all those forms is audio; the story for other forms of musical data (manuscript notation, chord sequences, performing editions, even lyrics) is far less good. Sometimes that is for copyright reasons, as rights-holders attempt to limit the distribution of their artifacts; sometimes that is for economic reasons, as the resources to digitize and encode public-domain sources of notation are unavailable. But the net effect is that while musical audio is limited and expensive, it is often not possible to purchase large amounts of musical data in other forms at any price.

So, in my still-hypothetical case study of Josquin/Gombert attribution, I could use notation as my primary source of data if I had encodings of sources or editions of the pieces I'm interested in – the needles, if you will; if I had those, I could verify that some computational algorithm could identify the association between sections of the chansons, Lugebat and Credo. That encoding task would be tedious, but achievable in – let's say – a week or so for the three pieces. The problem with that is that that is only half the battle; it is the construction of the needle, but for a method like this to be properly tested it's not fair just to let it loose on the needle; a search system has to be given the chance to fail by hiding the needle in a suitable haystack. And constructing a haystack – encoding a few hundreds of other Renaissance vocal works – is the thing that makes it less than practical to work on search within databases of notation, at least for this long-tail repertoire.

And that's why at the moment I investigate databases of musical audio: I can get the data, and in principle at least so can other people. If you can do it better, or just differently, why not consider applying for one of the Transforming Musicology mini-projects? There will be a one-day exploratory event at Lancaster University on the 12th February to explore possible project ideas and collaborations, and having as wide a pool of interested parties as possible is crucial for the production and execution of interesting ideas. Please come!

Syndicated 2014-01-13 20:35:49 from notes

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!