Bird sound analysis with MPTK: chirp vs gabor
I'm just back from a great two-week research visit to INRIA in Rennes. The fruit of our labour will be a new release of the Matching Pursuit ToolKit with some whizzy extra features and polish. In my previous blog entry I showed how we can use Matching Pursuit to detect patterns in spectrograms - now I want to show you a quick example of how these techniques can give you a clearer, more meaningful representation of sounds such as birdsong.
On my way home one day I got a nice recording of a chiff-chaff, so we'll use that as our example. (I also put the longer audio on Xeno Canto as XC125867.)
My particular concern is how to analyse this sound so we can capture some of the fine detail of the very fast pitch variation in birdsong - the chiffchaff is a clear example of this because it sings individual "notes", each with a very fast downward chirp onto the note.
So, using MPTK, I have a few choices of how to analyse. A classic option is using Gabor atoms, which you can think of as simple time-frequency blobs a little bit like the pixels in a spectrogram. MPTK can find a sparse representation of the signal using Gabor atoms - in the picture below, the first plot is a simple spectrogram, and the second one is the result of Matching Pursuit with Gabor atoms:
(BTW, the vertical axes aren't quite the same - oops.) As you can see, it's worked out how to build the energetic parts of the signal using a small number of Gabor atoms.
But another choice is to analyse using chirplets. These are a lot like Gabor atoms except they don't just have a fixed frequency, they can slope downwards or upwards in frequency. MPTK has a nice feature for efficient chirplet analysis (it uses Rémi Gribonval's fast matching pursuit technique for chirplets).
You can see the chirplet-based analysis in the bottom of the plot above. Notice how each syllable from the bird seems to begin with a big downward slash, showing a very fast downward chirp. That reflects what is actually happening and what you can hear in the recording.
The important thing, for me, is that chirplets here seem to be getting a much more meaningful representation of the signal than the Gabor atoms. This should be more useful for downstream analysis (whether by human or machine).
We can even sonify the difference using timestretching. Once I've analysed the sound using MPTK, I can reconstruct it... or I can manipulate the data and reconstruct modified versions of it. In the following MP3 player you'll hear 5 tracks. First the 7-second original recording. Then there's a reconstruction using chirplets; then a four-times-slower timestretch version made from the chirplets. Then the same but with Gabor atoms (a reconstruction, then a timestretch version).
In particular, compare the timestretched versions. With the Gabor version, you hear a lot of very robotised / quantised / MP3ish artifacts to the sound, whereas the chirplet version sounds much more natural. Still some artifacts in both, of course.
The Python code for these examples is available here - note that it relies on the pyMPTK wrapper, which is going to be in the soon-to-be-released MPTK version 0.7.