Older blog entries for danstowell (starting at number 33)

Notes from EUSIPCO 2012

Just back from the EUSIPCO 2012 conference in Bucharest. (The conference was held in the opulent Palace of the Parliament - see previous article for some thoughts on the palace and the town.) Here some notes about interesting talks/posters I saw:


Lots of stuff relevant to recognition in audio scenes, which is handy because that's related to my current work.

  • David Damm's "System for audio summarisation in acoustic monitoring scenarios". Nice approach and demo (with sounds localised around the Frauenhofer campus), though the self-admitted drawback is that it isn't yet particularly scalebale, using full DTW search etc.
  • Sebastien Fenet's "fingerprint-based detection of repeating objects in multimedia streams" - here a very scaleable approach, using fingerprints (as is done in other large-scale systems such as Shazam). In this paper he compared two fingerprint types: a Shazam-like spectral-peaks method (but using constant-Q spectrum); and a shallow Matching Pursuit applied to multiscale STFT. His results seem to favour the former.
  • Xavier Valero's "Gammatone wavelet features for sound classification in surveillance sapplications" - this multiscale version of gammatone is apparently better for detecting bangs and bumps and things like that.
  • M. A. Sehili's "Daily sound recognition using a combination of GMM and SVM for home automation" - they used something called a "Sequence Classification Kernel" which apparently can be used in an SVM to classify sequential data, even different-length sequential data. Have to check that out.
  • Two separate papers - Anansie Zlatintsi's "AM-FM Modulation Features" and Xavier Valero's "Narrow-band Autocorrelation features" - used features which are complementary to the standard Mel energies, by analysing the fine variation within each band. They each found improved results (for different classification tasks).

Plenty of informed audio source separation was in evidence too. Not my specialism, more that of others in our group who came along... but I caught a couple of them, including Derry Fitzgerald's "User assisted separation using tensor factorisations" and Juan-Jose Bosch's "Score-informed and timbre-independent lead instrument separation in real-world scenarios".

Other papers that were interesting:

  • T Adali, "Use of diversity in independent decompositions" - for indendence-based decompositions, you can use either of two assumptions about the components: non-Gaussianity or time-dependence. The speaker noted that measuring "mutual information rate" covers both of these properties, so it seems like a neat thing to use. She used it for some tensor decompositions which were a bit beyond me.
  • C Areliano's poster on "Shape model fitting algorithm withut point correspondence": simple idea for matching a hand image against a template which has marked points on it (but the query image doesn't): convert both representations into GMMs then find a good registration between the two GMMs. Could be useful, though the registration search is basically brute-force in this paper I think.
  • Y Panagakis prsented "Music structure analysis by subspace modeling" - it makes a lot of sense, intuitively, that music structure such as verse-chorus-verse should be suited to this idea of fitting different feature subspaces to them. The way music is produced and mixed should make it appropriate for this, I imagine (whereas for audio scenes we probably don't hop from subspace to subspace... unless the mic is moving from indoors to outdoors for example...)
  • Y Bar-Yosef's "Discriminative Algorithm for comacting mixture models with application to language recognition". Taking a GMM and approximating it by a smaller one is a general useful technique - here they were using Hershey and Olsen's 2007 "variational approximation" to the KLD between two GMMs. In this paper, their optimisation tries to preserve the discriminative power between two GMMs, rather than simply keeping the best fit independently.
  • I Ari's "Large scale polyphonic music transcription using randomized matrix decompositions" - some elegant tweaks which mean they can handle a very large matrix of data, using a weighted-random atom selection technique which reminds me a little of a kind of randomised Matching Pursuit (though MP is not what they're doing). They reduce the formal complexity of matrix factorisation, both in time and in space, so that it's much more tractable.
  • H Hu's "Sparsity level in a non-negative matrix factorisation based speech strategy in cochlear implants" - I know they do some good work with cochlear implants at Southampton Uni. This was a nice example: not only did they use Sparse NMF for noise reduction, and test it with human subjects in simulated conditions, but they also implemented it on a hardware device as used in cochlear implants. This latter point is important because at first I was dubious whether this fancy processing was efficient enough to run on a cochlear implant - good to see a paper that answers those kind of questions immediately.


Christian Jutten gave a plenary talk on source-separation in nonlinear mixtures. Apparently there's a proof from the 1980s by Darmois that if you have multiple sources nonlinearly mixed, then ICA cannot guarantee to separate them, for the following simple reason: ICA works by maximising independence, but Darmois proved that for any set of perfectly independent sources you can always construct a nonlinear mixture that preserves this independence. (Jutten gave an example procedure to do this; I think you could use the inverse-copula of the joint distribution as another way.)

Therefore to do source-separation on nonlinear mixtures you need to add some assumptions, either as constraints or regularisation. Constraining just to "smooth mappings" doesn't work. One set of mixture types which does work is "post-nonlinear mixtures", which means nonlinearities are applied separately to the outputs after linear mixing. (This is a reasonable model, for example, if your mics have nonlinearities but you can assume the sounds linearly mixed in the air before they reached the mics.) You have to use nonlinearities which satisfy a particular additivity constraint (f(u+v) = (f(u)+f(v))/(1+f(u)f(v)) ... tanh satisfies this).

Eric Moulines talked about prediction in sparse additive models. There's a lot of sparsity around at the moment (and there were plenty of sparsity papers here); Moulines' different approach is that when you want to predict new values, rather than to reverse-engineer the input values, you don't want to select a single sparsity pattern but aggregate over the predictions made by all sparsity patterns. He uses a particular weighted aggregation scheme which he calls "exponential aggregation" involving the risk calculated for each "expert" (each function in the dictionary).

Now, we don't want to calculate the result for an exponentially large number of sparsity patterns and merge them all, since that would take forever. Moulines uses an inequality to convert the combinatorial problem to a continuous problem; unfortunately, at the end of it all it's still too much to calculate easily (2^m estimators) so he uses MCMC estimation to get his actual results.

I also went to the tutorial on Population Monte Carlo methods (which apparently were introduced by Cappe in 2004). I know about Particle Filters so my learnings are relative to that:

  • Each particle or iteration can have its OWN instrumental distribution, there's no need for it to be common across all particles. In fact the teacher (Petar Djuric) had worked on methods where you have a collection of instrumental distributions, and weighted-sample from all of them, adapting the weights as the iterations progress. This allows it to automatically do the kind of things we might heuristically want: start with broad, heavy-tailed distributions, then focus more on narrow distributions in the final refinement stages.
  • For static MC (i.e. not sequential), you can use the samples from ALL iterations to make your final estimate (though you ned to take care to normalise appropriately).
  • Rao-Blackwellisation lets you solve a lower-dimensional problem (approximating a lower-dimensional target distribution) if you can analytically integrate to solve for a subset of the parameters given the other ones. For example, if some parameters are gaussian-distributed when conditioned on the others. This can make your approximation much simpler and faster.
  • It's generally held a good idea to use heavy-tailed distributions, e.g. people use Student's t distribution since heavier-tailed than Gaussian.

Syndicated 2012-09-02 05:50:57 from Dan Stowell

2 Sep 2012 (updated 2 Sep 2012 at 10:13 UTC) »

Bucharest's buildings and Ceauşescu's balcony

Untitled Untitled

I hope all Romanians have had the chance to stand on what was Ceauşescu's balcony and look down the humungous boulevard. When you see the ridiculous opulent excess of the palace, a vast building visible from many other parts of town, and the two-mile boulevard carved through the city to provide a view from the balcony, it's no wonder they had a bloody revolution.

The palace looks like it's on a hill, but it's not. It's twelve storeys tall, so big that on three sides the earth is banked up so that you have to go up through the steeply-sloped gardens to get to the entrance. The scale of what was done to the city is amazing, apparently referred to as Ceauşima ("Ceauşescu"+"Hiroshima").

UntitledUntitled Elsewhere, Bucharest is a beautiful city, but even with my superficial awareness it's clear that part of the beauty comes from its tumultuous and inequitable recent history. Picturesque pre-communist buildings sit side by side with bold Stalinist and brutalist buildings - and the contrasts between the different styles enhance each other's impact. Lots of old chapels nestling in the shadows of utilitarian housing blocks. Whatever the type of building, many seem unkempt, and occasionally there's a modern glass-and-plastic thing, plus of course the decidedly un-socialist polished pomp of the palace.

Bucharest has some lovely parks, and well-used by the locals - notable examples of large-scale civic planning done well. I don't know what era they're from, but they provide some great open spaces right near the middle of town.

Untitled Less perfect are the pavements... and the phone cabling, very often provided by a crazy mass of separate black wires slung from post to post, often sagging down below head height.

Untitled The "Casa Radio" just North of the river is a massive building, never completed I think, impressively decaying and ginormous. It was going to be a museum of communism. Now, Wikipedia tells me that it's intended to become a kind of monument of capitalism, in the sense that an international company intended to turn it into a huge shopping mall and hotel. That idea too is on hold, so for now we have twice over a monument to our own overreaching hubris.

One of the cutest historical references is in the park in front of the palace. There's a children's playground with a mini climbing palace in it, very obviously echoing the huge palace behind it:

I wonder, do the children play on it and play-act little Ceauşescus and little revolutionaries?

Syndicated 2012-09-02 04:53:20 (Updated 2012-09-02 05:33:48) from Dan Stowell

17 Aug 2012 (updated 17 Aug 2012 at 15:09 UTC) »

Comment on 'Seeing women as objects: The sexual body part recognition bias'

PREFACE: There's a risk that I might come across here as dismissing the research, and doing so for an odd reason. I'd like to be clear that I think this is an interesting study, and I'm not an expert in cognitive psychology but I'm writing because I'm interested in seeing these issues teased apart in more detail. See also the comments section.

Interesting article someone pointed out in European Journal of Social Psychology: Seeing women as objects: The sexual body part recognition bias. The basic idea is to use a psychophysics-type perceptual experiment to explore whether people looking at men and at women process them differently. If perceiving people "as objects" makes a difference to the cognitive processes involved, then that should be detectable.

There's plenty of evidence about our society's exaggerated emphasis on female body image, and the consequences of such objectification. What the researchers do here is use an experiment in which participants are shown images of men and women (either complete or partial images), and ask them to do a kind of spot-the-difference task. They find people get different percentage-correct scores depending on whether it's an image of a man or a woman one is looking at.

The researchers discuss this result as relating to objectification of women, and I think that's broadly OK, but there's an extra hop that I think is glossed over. A tweet summarised the research as "People perceive men using global processing, but women with local processing" but it would be more correct to say "People perceive images of men using global processing, but images of women with local processing". (It's not just the 140-character limit at fault here, the research paper itself makes the leap.)

The point is that the participants were reacting to 2D images, rather than real physical presences of men or women. Now, you might think, is that an important difference, or just quibbling? I'm not claiming that the results are wrong, and I'm not even claiming that the results don't tell us something about objectification of women. But the difference between looking-at-people and looking-at-images is important here since it relates closely to the claims being made - and this highlights the complexity of making measurements of socially-embedded cognitive processing.

Here's why I think it's a difference: In our everyday lives we see "3D" men and women. We also see "2D" images of men and women. So there are four pertinent categories here: 3D men, 3D women, 2D men-images and 2D women-images. We have absorbed general impressions about these four categories from out experiences so far (whether those "categories" are categories we use ourselves is beside the point). It's well known that there are more and different images of women than men, used in advertising and other media. As a person develops they see examples of all four categories around them, and they might learn similarities and differences, things that the categories have in common or not.

[Edit: Maybe a better way of putting it is inanimate-vs-animate, not 2D-vs-3D - see comments]

So, it's reasonable to expect that an average person in Western society is more familiar with objectified images of women around than of men. (Note that I do not claim this state of affairs is OK! I just claim that it's the average person's developmental environment.) It's easier to deal with familiar categories than unfamiliar ones. So we'd expect people to have better processing when presented with 2D body-part-images of women - and it probably correlates with their visual processing of real-life people, but that's not certain and it needs to be tested.

Am I claiming that the research should not be trusted? No. It looks like a decent and interesting experimental result. But the authors make a slight leap, which we should treat with caution: they imply that their statistically significant result on how people visually process 2D-images-of-men and 2D-images-of-women transfers directly to how people visually process men and women in the flesh. Personally I would expect that people's perception of "3D" men and women probably partly generalises from the image perception and partly doesn't. (There might be existing research on that; comments welcome.)

And obviously it's much harder to conduct large experiments by showing people "glimpses of real live men/women" rather than images, so there's a good reason why such research hasn't yet been done.

But that's good news right? - more research needed ;)

Syndicated 2012-08-17 08:45:03 (Updated 2012-08-17 10:57:04) from Dan Stowell

12 Aug 2012 (updated 12 Aug 2012 at 20:10 UTC) »

The Olympics can be both successful AND unjust ...lessons for Rio?

There was a lot of negative press in the run-up to the Olympics, with the G4S security cockup, the ever-inflating costs, and the Zil lanes. But after a spectacular opening ceremony, nice weather and some lovely stories of Olympics winners, the mood has swung really positive.

This has led to a major anti-whingeing press offensive. For example this Independent article described "a great festival of pre-emptive whingeing followed by people having a surprisingly good time" and concluded "It is time for the country to put its doubts to one side." Of course Boris Johnson put it more pithily when he wrote in The Sun "Oh come off it, everybody - enough whimpering."

The main argument of the anti-whingeing offensive is to point out how well it's all gone: no public transport problems, no security calamities, and so on. (Let's put aside for now the fact that all London businesses outside the olympic venues have had a surprising drought of business, which is a bit of a fail that I guess the London organisers could have helped prevent.) But there's an unfortunate problem with this: most of the nay-sayers did not say the Olympics would fail, they said that the Olympics was a bad deal, in both the moral and the financial senses.

Not every criticism has equal weight. Personally, I don't think it's particularly awful that some commuter routes and cycle routes are disrupted, or that police are brought in from the counties to help police the event, for example. We need to take a balanced view of things: we recognise that London changes a bit when the world's biggest sport festival is happening here.

But that doesn't mean that we should sacrifice our civil liberties on the altar of the event. We have British values such as liberty, decency and fair-mindedness, which we believe are reflected in the way we run our policing etc. If we don't make sure the Olympics lives up to these values when it's here, what chance do we have of helping it behave that way when it's in other countries?

So yes, it's good that lots of people are enjoying a smooth-running Olympics, and the British medal success has made it a particularly jolly affair for the country. But the end does not justify the means, and for the purposes of memory I'd like to summarise three types of unpleasantness that were completely unneccessary to build into the the Olympics:

  1. The sponsors' dictatorship and the militant denial of small businesses' / communities' participation
  2. Security clampdowns and chilling effects on civil liberties
  3. Financial weaselings

But the most important thing right at this moment is what lessons can we pass on to Rio, which I'll consider at the end of this post.

1: The sponsors' dictatorship

The sponsors' dictatorship has been extensively covered in the press. For example, an Oxfordshire farm supplying local produce is not only banned from mentioning its association with the Olympics, but banned from doing it for the next ten years. (Source). (Similarly for the UK plastics industry apparently and they're not happy about it either.)

Butchers have been banned from displaying Olympic rings made of sausages, pubs warned they can't advertise the fact that they're showing the Olympics on telly, and cafes banned from selling a "flaming torch breakfast baguette" (source). This is all supposed to be to protect the sponsors' investment - but the rather extreme policing is completely out of proportion to the following fact:

The money from sponsorship covered a tiny fragment of the costs of the Olympics: around 7%.
(Data sources: Parliametary Accounts Committee, Sponsors list at Guardian)

The suppression of UK businesses' benefiting from the Olympics in perfectly ordinary ways is a refusal to recognise this balance. The amount of money coming from UK taxpaying businesses is comparable with the amount of money coming from the official sponsors (since around 12% of UK tax income is directly from businesses). So this heavy-handedness should not be tolerated. The sponsors, maybe their participation is fine (though some argue they're not even needed - that the 7% could have been trimmed or come from elsewhere, and of course the brand-policing costs would have evaporated immediately). But assuming that we don't mind sponsorship, we should accept it without trampling over local businesses' ordinary grassroots involvement.

Incidentally, Jeremy Hunt (the government minister for the Olympics) has made very different pronouncements on the importance of the sponsors: "the sponsors are actually paying for about half of the cost of hosting the Olympics." This is a direct quote from him speaking on Radio 4 (2012-07-23 13:32, World At One), but it is so clearly out of line with the known figures that I can't think of any explanation other than it's propaganda designed to blunt opposition to the olympic brand police, which was the issue under discussion at the time.

A large cause of the branding strictness is because of ambush marketing, where rival non-sponsors essentially embarrass the sponsors by smuggling their own branding opportunities into the event. So the IOC introduced rules (as did FIFA and other such bodies) to try and prevent this. Unfortunately the consequences have swung far too far in the sponsors' favour.

One thing that seems crazy to me is that you're not allowed to use non-Visa cards to buy Olympic tickets, and ATMs have been forced to close, replaced with a (smaller) number of Visa-only machines. This is actually very different from sponsorship - it means that you cannot be a spectator at the games without signing up with Visa. You can keep out of the McDonalds, you can frown at Dow Chemicals, but you simply won't be allowed in without a Visa card. Why hasn't this been mentioned more often in the media? It's an abusive monopoly.

2: Security clampdowns and chilling effects on civil liberties

There are already "chilling effects" described above in the sponsor/logo ridiculousness: for example, the police wasted time emptying their crisps into clear plastic bags in case they ired the sponsors. But much worse than this is the chilling effect on free speech and other civil rights.

Very notable has been the intimidation and disruption of photographers. Many different people have reported that G4S and sometimes the police have stopped journalists taking pictures in a public place, using fairly vague ideas about it being a security issue. The Guardian has a video of it happening.

But it's not just photographers:

  • People who had done nothing wrong at all were arrested, had their homes raided & forbidden to use public transport. These were ex-graffiti artists arrested just-in-case. Unfortunately, this pre-emptive rounding up of people who "might" cause trouble was recently supported by the High Court, who dismissed complaints about police arresting various people who had done nothing wrong just in case they caused controversy during the day of the royal wedding. It's difficult for the police to respond to modern phenomena such as flashmobs but this sounds like plain injustice to me: people who have done nothing wrong detained on the basis of their political beliefs. There's a delicate balance needed here, to protect people's democratic rights.
  • Groups of two or more people (or young people at night) are banned from the Stratford area under specific regulations which I guess must have quite an impact on people who live in Stratford. "A Met spokesman admitted that previous dispersal orders in the area had transferred problems to other areas, but said the force hoped it would not happen this time."
  • A woman received a police warning for a Facebook joke about squirting the Olympic flame with a water pistol.
  • A man in Surrey was arrested (later freed without charge) "based on his manner, his state of dress and his proximity to the [cycling] course". The press summarised it as arrested in Leatherhead for 'not smiling'.

This kind of heavy-handedness is used not only to reduce the chance of flashmobs and other disruption, but also to suppress dissent. The absolute peak of this: police even arrested three protestors for pouring custard over their own heads in Trafalgar Square. As a distilled example of the British spirit right now - critical awareness and up-yours spirit paired with the chilling effects of policing heavy-handedness - I don't know how anyone can top this.

3: Financial weaselings

I don't need to reinforce the fact that the costs of the Olympics ballooned beyond original estimates. That's almost inevitable. And no, no-one expects that money to come back in increased tourism and business - it hasn't done that in the past and that was never the point.

I just want to mention some of the smaller-scale unfairnesses that sneaked under the radar a little:

There's also an issue of the land used for Olympic developments is being developed with public money, and ending up in private hands. The Olympic Park will turn into the "Queen Elizabeth Olympic Park", but it will not be a Royal Park (not managed by the Royal Parks Agency), and I'm not entirely clear on how the ownership is managed but a lot of it will be technically private (source). And the Olympic village has already been sold to a company which will use it for private rental housing, apparently at a loss to the taxpayer. Hm. (The loss-making side of this venture is at least in part a consequence of the economic crash.)

What lessons can we pass on to Rio?

The Olympics is a strange travelling carnival, but it's one that has unprecedented power: it can reroute an entire capital city, it can force countries to pass new laws, and it can consume billions of pounds from the taxpayer.

But because it is travelling, it is largely unaccountable. Having learnt from our experiences, we Londoners and our elected representatives are in no position to change the way the Olympics works - for example to rebalance the sponsorship barminess.

Rio is the site for the next games. I guess they've already undertaken to pass a bizarre olympic logos-and-words-and-sponsors law. But can the citizens of Rio learn from us? Would we British not have said anything, if we'd noticed when a law was being passed that banned small businesses from offering an "olympic breakfast" or using the words "Summer" and "Gold" in promotional material? And will the number of gold medals we won really make us forget the heavy-handed and half-competent security clampdown we've been through, with its chilling effects on journalism, free speech, local business and civil liberties?

(I should acknowledge that there's no reason to think the 2012 Olympics have been particularly "worse" than, say, the 2008 Olympics. I don't know much about the Beijing organisation, but there are stories of communities being moved wholesale out of their homes etc.)

It would be a big shame if the lesson for Rio is "don't worry about any criticism - as long as you can pull it off no-one minds how you do it." How can we help Rio, and future olympic venues, to have an enjoyable games without compromising their values to the unaccountable juggernaut that carries it along?

Syndicated 2012-08-12 09:46:51 (Updated 2012-08-12 15:44:50) from Dan Stowell

Installing Cyanogenmod 7 on HTC Tattoo

Just notes for posterity: I am installing Cyanogenmod on my HTC Tattoo Android phone.

There are instructions here which are perfectly understandable if you're comfortable with a command-line. But the instructions are incomplete. As has been discussed here I needed to install "tattoo-hack.ko" in order to get the procedure to complete (at the flash_image step I got "illegal instruction" error). Someone on the wiki argues that you don't need that instruction if you're upgrading from stock Tattoo - but I'm upgrading from stock Tattoo and I needed it.

In the end, as advised in the forum thread linked above I followed bits of these instructions and used tattoo-hack.ko as well as the different versions of "su" and "flash_image" linked from that forum thread, NOT the versions in the wiki.

Also, there's an instruction in the wiki that says "reboot into recovery mode". It would have been nice if it had spelt that out for me. The command to run is

         adb reboot recovery

Syndicated 2012-08-08 18:11:14 from Dan Stowell

Some things I have learnt about optimising Python

I've been enjoying writing my research code in Python over the past couple of years. I haven't had to put much effort into optimising it, so I never bothered, but just recently I've been working on a graph-search algorithm which can get quite heavy - there was one script I ran which took about a week.

So I've been learning how to optimise my Python code. I've got it running roughly 10 or 20 times faster than it was doing, which is definitely worth it. Here are some things I've learnt:

  • The golden rule is to profile your code before optimising it; don't waste your effort. The cProfile module is surprisingly easy to use, and it shows you where your code is using the most CPU. Here's an example of what I do on the commandline:

    # run your heavy code for a good while, with the cProfile module logging it ('streammodels.py' is the script I'm profiling):
    python -m cProfile -o scriptprof streammodels.py
    # then to analyse it:
    import pstats
    p = pstats.Stats('scriptprof')
  • There are some lovely features in Python which are nice ways to code, but once you want your code to go fast you need to avoid them :( - boo hoo. It's a bit of a shame that you can't tell Python to act as an "optimising compiler" and automatically do this stuff for you. But here are two key things to avoid:

    • list-comprehensions are nice and pythonic but they're BAD for fast code because they create an array even if you don't need it. Instead, use things like map() or filter() which process data without constructing a new array. (Also use "xrange" rather than "range" if you just want to iterate a range rather than keeping the resulting list.)
    • lambdas and locally-defined functions (by which I mean something like "def myfunc():" as a local thing inside a function or method) are lovely for flexible programming, but when your code is ready to run fast and solid, you will often need to replace these with more "ordinary" functions. The reason is that you don't want these functions constructed afresh every time you use them; you want them constructed once and then just used.
  • Shock for scientists and other ex-matlabbers: using numpy isn't necessarily a good idea. For example, I lazily used numpy's "exp" and "log" when I could have used the math module and avoided dragging in the heavy array-processing facilities that I didn't need. After I changed my code to not actually use numpy (I didn't need it - I wasn't really using array/matrix maths for this particular code), I went much faster.

  • Cython is easy to use and speeds up your python code by turning it into C and compiling it for you - who could refuse? you can also add static typing things to speed it up even more but that makes it not pure python code so ignore that until you need it.

  • Name-lookups are apparently expensive in python (though I don't think the profiler really shows the effect this, so I can't tell if it's important). there's no harm in storing something in a local variable -- even a function, e.g. "detfunc = scipy.linalg.det".

So now I know these things my Python runs much faster. I'm sure there are many more tricks of the trade. For me as a researcher, I need to balance the time saved by optimising against the flexibility to change the code on a whim and sto be able to hack around with it, so I don't want to go too far down the optimisation rabbit-hole. The benefit of Python is its readability and hackability. It's particularly handy, for example, that Cython can speed up my code without me having to make any weird changes to it.

Any other top tips, please feel free to let me know...

Syndicated 2012-07-17 05:26:01 (Updated 2012-07-17 05:37:52) from Dan Stowell

13 Jul 2012 (updated 17 Jul 2012 at 12:11 UTC) »

A very simple toy problem for matching pursuits

To help me think about how and why matching pursuits fail, here's a very simple toy problem which defeats matching pursuit (MP) and orthogonal matching pursuit (OMP). [[NOTE: It doesn't defeat OMP actually - see comments.]]

We have a signal which is a sequence of eight numbers. It's very simple, it's four "on" and then four "off". The "on" elements are of value 0.5 and the "off" are of value 0; this means the L2 norm is 1, which is convenient.

  signal = array([0.5, 0.5, 0.5, 0.5, 0, 0, 0, 0])

Diagram of signal

Now, we have a dictionary of 8 different atoms, each of which is again a sequence of eight numbers, again having unit L2 norm. I'm deliberately constructing this dictionary to "outwit" the algorithms - not to show that there's anything wrong with the algorithms (because we know the problem in general is NP-hard), but just to think about what happens. Our dictionary consists of four up-then-down atoms wrapped round in the first half of the support, and four double-spikes:

  dict = array([
   [0.8, -0.6, 0, 0, 0, 0, 0, 0],
   [0, 0.8, -0.6, 0, 0, 0, 0, 0],
   [0, 0, 0.8, -0.6, 0, 0, 0, 0],
   [-0.6, 0, 0, 0.8, 0, 0, 0, 0],
   [sqrt(0.8), 0, 0, 0, sqrt(0.2), 0, 0, 0],
   [0, sqrt(0.8), 0, 0, 0, sqrt(0.2), 0, 0],
   [0, 0, sqrt(0.8), 0, 0, 0, sqrt(0.2), 0],
   [0, 0, 0, sqrt(0.8), 0, 0, 0, sqrt(0.2)],

Diagram of dictionary atoms

BTW, I'm writing my examples as very simple Python code with Numpy (assuming you've run "from numpy import *"). We can check that the atoms are unit norm, by getting a list of "1"s when we run:

  sum(dict ** 2, 0)

So, now if you wanted to reconstruct the signal as a weighted sum of these eight atoms, it's a bit obvious that the second lot of atoms are unappealing because the sqrt(0.2) elements are sitting in a space that we want to be zero. The first lot of atoms, on the other hand, look quite handy. In fact an equal portion of each of those first four can be used to reconstruct the signal exactly:

  sum(dict * [2.5, 2.5, 2.5, 2.5, 0, 0, 0, 0], 0)

That's the unique exact solution for the present problem. There's no other way to reconstruct the signal exactly.

So now let's look at "greedy" matching pursuits, where a single atom is selected one at a time. The idea is that we select the most promising atom at each step, and the way of doing that is by taking the inner product between the signal (or the residual) and each of the atoms in turn. The one with the highest inner product is the one for which you can reduce the residual energy by the highest amount on this step, and therefore the hope is that it typically helps us toward the best solution.

What's the result on my toy data?

  • For the first lot of atoms the inner product is (0.8 * 0.5) + (-0.6 * 0.5) which is of course 0.1.
  • For the second lot of atoms the inner product is (sqrt(0.8) * 0.5) which is about 0.4472.

To continue with my Python notation you could run "sum(dict.T * signal, 1)". The result looks like this:

  array([ 0.1,  0.1,  0.1,  0.1,  0.4472136,  0.4472136,  0.4472136,  0.4472136])

So the first atom chosen by MP or OMP is definitely going to be one of the evil atoms - more than four times better in terms of the dot-product. (The algorithm would resolve this tie-break situation by picking one of the winners at random or just using the first one in the list.)

What happens next depends on the algorithm. In MP you subtract (winningatom * winningdotproduct) from the signal, and this residual is what you work with on the next iteration. For my purposes here it's irrelevant: both MP and OMP are unable to throw away this evil atom once they've selected it, which is all I needed to show. There exist variants which are allowed to throw away dodgy candidates even after they've picked them (such as "cyclic OMP").

NOTE: see the comments for an important proviso re MP.

Syndicated 2012-07-13 10:37:44 (Updated 2012-07-17 07:25:59) from Dan Stowell

13 Jul 2012 (updated 15 Jul 2012 at 11:14 UTC) »

A* orthogonal matching pursuit - or is it

The delightful thing about the A* routing algorithm is that it is provably the optimal algorithm for the purpose, in the sense that it's the algorithm that visits the fewest possible path nodes given the information made available. See the original paper for proof. Despite its simplicity, it is apparently still used in a lot of industrial routing algorithms today, and it can be adapted to help solve other sorts of problem.

A colleague pointed out a paper about "A*OMP" - an algorithm that performs a kind of OMP (Orthogonal Matching Pursuit) with a tree search added to try out different paths towards fitting a good sparse representation. "Aha," I thought, "if they can use A* then they can get some nice recovery properties inherited from the A* search."

However, in reading the paper I find two issues with the A*OMP algorithm which make me reluctant to use the name "A*" for it:

  1. The heuristics used are not "consistent" (this means you can't guarantee they are always less-than-or-equal to the true distance remaining). This lack of consistency means the proof of A*'s optimality doesn't apply. (Remember, A*'s "optimality" is about the number of nodes inspected before finding the best path.) (EDIT: a colleague pointed out that it's actually worse than this - if the heuristic isn't consistent then it's not just sub-optimal search, it may fail to inspect the best path.)
  2. Since A*OMP restricts the number of paths it adds (to the top "B" atoms having largest cross-product with the residual) there are no guarantees that it will even inspect the true basis.

These issues are independent of each other. If you leave out the pragmatic restriction on the number of search paths (to get round the second issue), the first issue still applies. OMP itself is greedy rather than exact, so this doesn't make A*OMP worse than OMP, but to my mind it's "not as good as A*".

In practice, the authors' A*OMP algorithm might indeed get good results, and the experiments shown seem to do so. So my quibbles may be mere quibbles. But the name "A*" led me to expect guarantees that just aren't there (e.g. guarantees of being better than OMP). It's quite easy to construct a toy problem for which A*OMP will not get you nearer the true solution than OMP will.

It's not obvious how to come up with a consistent heuristic. For a given problem, if we knew there was an exact solution (i.e. zero residual was possible within the sparsity constraints) then we could use the residual energy, but since we can't know that in general then the residual energy may overestimate the distance to be "travelled" to the goal.

One minor thing: their "equivalent path pruning" in section 4.2.3 is a bit overkill - I know a simpler way to avoid visiting duplicate paths. I'll leave that as an exercise for the reader :)

Syndicated 2012-07-13 06:38:06 (Updated 2012-07-15 06:50:08) from Dan Stowell

24 Jun 2012 (updated 24 Jun 2012 at 14:08 UTC) »

Rhubarb in the hole

I've been experimenting with rhubarb, looking at nice ways to cook it in the oven without stewing it first (so it keeps its shape). Here's a rather nice one: rhubarb in the hole.

These amounts serve 2. Should work fine if you double the amounts - you just need a roasting tin big enough to sit all the rhubarb in with plenty of space.

  • 2 medium sticks rhubarb
  • 5 or 6 heaped tbsp ginger preserve (ginger jam)
  • 3 tsp caster sugar
  • Oil or marge for greasing
  • For the batter:
    • 1 egg
    • 1/2 cup (125 ml) milk
    • 50 g (1/2 cup) plain flour
    • Pinch of salt
    • 3 tsp caster sugar

In a mixing jug, beat the egg then add the milk and beat again. Add the salt and 3 tsp sugar and beat again, then gradually whisk in the flour to make a medium-thick batter. (This is normal Yorkshire pudding batter but slightly sweetened.)

If you can, leave the batter to sit for about 30 mins, then whisk again.

Put the oven on at 200 C. Grease the roasting tin / Yorkshire-pudding tin and put it in the oven to preheat.

Rinse the rhubarb and cut it into 8cm (3in) pieces. On a chopping board or in a bowl, mix the rhubarb pieces with the ginger preserve to get them nice and evenly coated. Sprinkle the other 3 tsp sugar onto them.

Assembling the pudding can be a bit tricky - mainly because you want to work fast, so the tin stays hot. Get it out of the oven, then wobble it around to make sure the grease is evenly spread just before you pour in the batter. Then place the rhubarb pieces in (probably best to do it one-by-one) so they're evenly spread but they're all a good couple of centimetres away from the edge.

Immediately put this all back in the oven and bake for 20-25 minutes. The batter should rise and get a nice golden-brown crust.

You can serve it on its own, or with a small amount of ice cream maybe.

(I had expected it to be a bit dry on its own, but actually the fruit keeps it moist so the ice cream isn't compulsory.)

Syndicated 2012-06-24 09:07:36 (Updated 2012-06-24 09:30:22) from Dan Stowell

9 Jun 2012 (updated 22 Jun 2012 at 08:09 UTC) »

My research about time and sound

I've got three papers accepted in conferences this summer, and they're all different angles on the technical side of how we analyse audio with respect to time.

In our field, time is often treated in a fairly basic way, for reasons of tractability. A common assumption is the "Markov assumption" that the current state only depends on the immediate past - this is really handy because it means our calculations don't explode with all the considerations of what happened the day before yesterday. It's not a particularly hobbling assumption - for example, most speech recognition systems in use have the assumption in there, and they do OK.

It's not obvious whether we "need" to build systems with complicated representations of time. There is some good research in the literature already which does, and they have promising results. And conversely, there are some good arguments that simple representations capture most of what's important.

Anyway, I've been trying to think about how our signal-processing systems can make intelligent use of the different timescales in sound, from the short-term to the long-term. Some of my first work on this is in these three conference talks I'm doing this summer, each on a different aspect:

  1. At EUSIPCO I have a paper about a simple chirplet representation that can do better than standard spectral techniques at representing birdsong. Birdsong has lots of fast frequency modulation, yet the standard spectral approaches assume "local stationarity" - i.e. they assume that within a small-enough window, we can treat the signal as unchanging. My argument is that we're throwing away information at this point in the analysis chain, information that for birdsong at least is potentially very useful.

  2. At MML I have a paper about tracking multiple pitched vibrato sounds, using a technique called the PHD filter which has already been used quite a bit in radar and video tracking. The main point is that when we're trying to track multiple objects over time (and we don't know how many objects), it's suboptimal to just take a model that deals with one object and apply the model multiple times. You benefit from using a technique that "knows" there may be multiple things. The PHD filter is one such technique, and it lets you model things with a linear-gaussian evolution over time. So I applied it to vibrato sources, which don't have a fixed pitch but oscillate around. It seems (in a synthetic experiment) the PHD filter handles them quite nicely, and is able to pull out the "base" pitches as well as the vibrato extent automatically. The theoretical elegance of the filter is very nice, although there are some practical limitations which I'll mention in my talk.

  3. At CMMR I have a paper about estimating the arcs of expression in pianists' tempo modulations. The paper is with Elaine Chew, a new lecturer in our group who works a lot with classical piano performance. She has had students before working on the technical question of automatically identifying the "arcs" that we can see visually in expressive tempo modulation. I wanted to apply a Bayesian formulation to the problem, and I think it gives pretty nice results and a more intuitive way to specify the prior assumptions about scale.

So all of these are about machine learning applied to temporal evolution of sound, at different timescales. Hope to chat to some other researchers in this area over the summer!

Syndicated 2012-06-09 07:18:44 (Updated 2012-06-22 03:52:09) from Dan Stowell

24 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!