I've taken a break from kernel hacking to do something more pressing. I got annoyed with the cost of software for automatic vocal pitch correction (on the order of $350-$700). It's simple enough that there's no reason for it to be so expensive that only people who do this stuff professionally can afford it.
Don't get me wrong, I have no problem with paying for well-written software that's complex in nature. I'd pay $350 for an audio editing program. But paying $350 for a mere audio filter? I can buy a stand-alone hardware solution for $500, and unlike software, if it doesn't work, I have the right to send it back or sue. There's no way I'd pay that kind of money for a piece of filtering software, even if I -had- that much to spend on it (which I don't -- I'm doing audio recording entirely for fun in my spare time as a hobby).
To make matters worse, none of the companies appeared to have a Mac OS X version, making their software completely useless to me. I contacted one of them, and they suggested they might have one by the end of the year. Thanks, but no thanks. If I'm going to pay that kind of money for something that does so little, it darn well better have been Mac OS X native three months ago.
So, in desperation, last night, I pulled together FFT routines and fundamental frequency detection code, and I wrote a lot of glue code, including a first cut at an arbitrary frequency shifter. With the exception of generating the frequency table and actually doing the file I/O, I essentially have a first cut at the software in a little over three hours of coding, assuming my reasoning was sound and I didn't make any stupid mistakes in the math.
The basic idea is this:
1. Take a chunk of size 2^k for some k.
2. Detect the fundamental frequency.
3. Look in a table to determine the frequency of the nearest note.
4. Apply a FFT to the chunk.
5. For each frequency-domain array slot, multiply the array index by the new frequency and divide by the old to get the new index.
6. For each slot, copy the value stored at the new index in the input array into the slot specified by the old index in the output array.
7. Apply a reverse FFT to the chunk.
I'm also working on some clever tricks to avoid glitches at the FFT chunk boundaries with as little aliasing as possible, if my theories hold. The real trick is to build up the table. There are three parameters:
1. Base frequency (i.e. A = 440),
2. Temperament (equal tempered, C major, D minor, whatever), and
3. Compression ratio.
I'll probably only do equal temperament in the first cut to save time. The third option is perhaps the most interesting. The idea is to assume that the singer is pretty close to on-pitch, i.e. not a tone-deaf singer. Instead of hard-locking the voice to an exact pitch, it would instead move it "closer" to the right pitch.
Within a certain range, it will lock it to the right pitch, but beyond that, it will slowly move away, becoming faster as it gets closer to a semitone away, with the semitone being unaltered, and the exact opposite as it gets closer to the next note up or down.
The implications of this are that a slide, glissando, etc. will end up sounding more rapid, but should otherwise sound reasonable, despite the modifications. Similarly, intentional vibrato will be reduced, but generally not eliminated. The idea is to end up with a voice that doesn't sound like it has been artificially "forced" to pitch, i.e. with a certain amount of pitch variation, but that still sounds basically in tune.
Barring any nasty bugs, I expect to have a first cut (equal temperament only) finished this weekend at the latest. Wish me luck.