Interview: Christopher Montgomery of Xiphophorus

Posted 4 Apr 2000 at 21:18 UTC by advogato Share This

This week, Advogato interviews Christoper Montgomery of the Xiphophorus Company. Monty has been fairly quietly working on streaming multimedia for many years. With luck, his latest project is about to take the world by storm: Vorbis, an audio codec that is not only completely free, but also sounds noticeably better than MP3 for similar compression ratios. Read on for a discussion of multimedia codecs, the RIAA's misguided attempts to control copying, the role of "deep wizardry" in free software projects, and more.

Advogato: How did you get started doing free software?

Christopher Montgomery: Programming had already been a hobby most of my life. When I got to college, I thought I was pretty hot stuff; I'd never been around better programmers than I was. That changed drastically at MIT. I didn't feel stupid (like all my HS guidance counsellors warned me), I felt like I finally belonged. But it also became very clear that I had so much to learn, I wasn't even on the map. My mentors were free software hackers, and it only made sense that I became one too.

Who are the mentors that influenced you the most?

A few instructors were inspirations, perhaps not personal mentors: Hal Abelson, Jerry Sussman, Greg Papadopoulos. The people I learned the most from daily around MIT were my own generation and a little earlier. I'm a bit afraid to name names only because there were quite a few, and I'll forget someone important...

Seth Finkelstein, Mike Bauer, Mark Eichin, Marc Horowitz, Greg Hudson (whether he knows it or not ;-) and a lot of other people whom I'm inadvertently insulting through omission... maybe I shouldn't have started listing at all. Many of the folks I'm thinking of may or may not have known more than me, but they certainly knew a lot I didn't.

Some people probably know you from cdparanoia and MGM, while others are more familiar with Vorbis. Why did you write cdparanoia and why are you working on Vorbis?

Cdparanoia was a spinoff of the Ogg project many years ago; I needed a reliable way to get long samples of CD for testing my CODECs and cdda2wav didn't work well on any cdrom drive I could afford then (I was still a student at the time).

Half the reason I wrote Vorbis is simply because I've been working on audio CODECs for about six years now, and I'm finally getting good at it. However, my interest was beginning to wane a bit after a lot of code and not much success.... then Fraunhofer decided to sue everyone in sight. That's what galvanized Vorbis. I've been working on it steadily since November 1998.

Vorbis is [approximately] the sixth generation of Ogg and the first CODECs that I feel is ready to go forth and do battle in the streaming arena. It's not enough to be Free and as good as MPEG. I have to be Free and clearly better.

Could you briefly give an overview of Ogg? The patent situation with Fraunhofer certainly has made clear the need for a totally free CODECs. What kind of support have you been getting for this project?

Ogg is the name of the overall project. Vorbis is our first CODEC (for lossy audio streaming), but we'll continue into video and lossless audio, as well as 'metacontent', etc.

I've actually received more corporate interest and support than I expected (formerly the Green Witch LLC of San Francisco and now iCast of Woburn Massachusetts sponsors Vorbis development). The small and middle sized players in the online download and streaming market are the ones being squeezed by the big players that control the technology, so they're very interested in Vorbis.

The Open Source community doesn't know much about Vorbis and I've not been trying to attract attention; I wanted an alpha-grade release of the Vorbis libraries, tools, and plugins before crowing too hard. That may prove to be a mistake, but I have been worried about attracting more attention that I'm ready for (with incomplete code), or damaging our credibility with something that isn't ready.

Now, though, the code is nearly past the stage where you need to be a hacker to get it going.

And in the meantime, you've moved from Massachusetts to the California Bay Area. Is this just to make sure you're on a different coast than your corporate sponsors?

I came out to the Bay area to get married, nothing more to it. My wife doesn't have the freedom to move around so I came out here.

Metacontent? What do you mean by that?

'Content about the content'. Things like lyrics, album covers, any additional data that isn't really part of the core content, but makes sense to package along anyway.

What kind of licensing arrangements apply to the Vorbis protocol spec and your implementation? Will the iCast product be free software?

Vorbis, as I'm working on it, is a xiph.org product; iCast is a sponsor, but they didn't buy us.

The Vorbis spec is free; the only right we retain over the spec is to set it and certify compliance. I'd love to see third party implementations of Vorbis with whatever software license the new author sees fit to use.

The Vorbis libraries we're writing are all LGPL. The tools for encode/decode are GPL.

So, how does Vorbis compare to MP3?

Right now, it compares well. I get a few emails a day from folks who stumble across it, try it, and then tell me it's better.

I think that the current CVS mainline is as good as MP3. This weekend I'm merging a month's worth of new work that substantially improves it.

What are the main things you're doing differently?

MP3 and Vorbis aren't actually very similar except in very shallow ways. Vorbis resembles TwinVQ a bit more closely, but again, poking at it for more than a few minutes tells you that the two have very little in common beyond some mathematics.

So, unfortunately, I don't have many shared landmarks between MP3 and Vorbis to contrast directly.

Vorbis is FFT-based, while MP3 is not?

Vorbis does not subband, but uses an MDCT directly. In this way, it resembles TwinVQ. (I was going to grad school in Japan at the time TwinVQ 1 was being developed, and I had the unbelievable luck of taking a few classes from the professor who also headed the lab developing it.)

Is TwinVQ proprietary or free?

Proprietary, but how much so is hard to tell. TwinVQ appears to be more closed than MP3, but it's been suffering popularity problems. NTT hasn't been very good at marketing it. It's had less scrutiny because to not many people are paying attention to it.

To elaborate on a previous statement; the frequency domain stage of Vorbis uses an MDCT directly. Vorbis also provides for time domain encoding.

Is that a feature unique to Vorbis?

Compared to TwinVQ and MPEG, yes. Compared to the history of audio coding, no. And this is less finished than the rest of the CODEC and likely to be released somewhat later. But for strongly non-tonal audio, we also encode time features using wavelets.

That counts as 'experimental but very promising'. Don't expect to see it in 1.0, but it would be a priority once the absolutely necessary features are ready. At some point, you just have to back off a bit, realize what you already have is excellent, and resist the urge to spend another four months making it even better before 1.0 :-)

So you're basically combining the best techniques out there that aren't patented up the wazoo, right?

It's hard to tell with the current patent climate; some very broad, obviously invalid patents apply to audio and I've been worried in the past that it doesn't really matter what's really patented; the lawyers will find a way to sue you. However, there's now a corporate war chest behind Vorbis and I'm moving ahead.

Speaking of which, when do you plan to release 1.0?

We were hoping for this Tuesday. I'm behind schedule. Not by much, but it's still annoying. So we'll likely go ahead with the press releases, update the overly out of date xiph.org Vorbis pages and continue forth. At this point, Vorbis will benefit from attention; the majority of the work left to be done doesn't involve wizardry.

What we have in CVS right now has a complete API and runs solidly. It's missing features (and the stuff I'm committing now alters the bitstream format incompatibly), but it stands up.

Excellent! I'm sure I'm not alone in looking forward to the release. But with the huge momentum behind MP3, what chance do you think Vorbis has to gain a foothold?

Someone always asks that in some form ;-) Forgive me for sounding cynical and elitist, but the consumers use whatever the industry settles on. The industry wants Vorbis because MPEG is currently running itself as an exclusive club. As the industry decides to use the higher quality Vorbis format for free rather than MP3 with its steep licensing and royalties, the consumers will get behind it.

The Open Source community is a bit different; here I can stand on technical merits.

Do you really think so? I'm amazed by the number of people who still use bladeenc even though LAME is head and shoulders above it.

Touche :-)

What I meant was that the Open Source community is relatively a much higher percentage of technically literate early adopters. Taking the OS community by storm doesn't assure widespread success elsewhere, but a good project has a better chance there.

But aren't there some people who prefer digital audio to be a closed club? After all, Vorbis doesn't have any copy protection, region codes, or any of that.

Neither does the competition. The algorithmic equivalent of a wad of gum in the keyhole is not real security, regardless of the press releases that claim otherwise.

But really, this is the real can of worms.

I try to avoid political baggage and philosophical dogma, but I fundamentally disagree with the amount of control the music industry is trying to place over distribution. It is not realistic, it is not practical, and moreover it just worries me.

Let me state for the record that I want the artists to get their money, and much more so than the record companies do. The RIAA can shout piracy all they want, but that isn't what it's about. It's about control, and only about control.

Do you want the RIAA to have the ability to tell you that you may only play your music on a single 'Walkman'? BTW, they have a deal with Sony this year, and the price of that Walkman just went up.

Similar scenarios played out just this past month (albeit concerning MPEG licensing, not the RIAA)! I don't have to construct a 'slippery slope' argument, because we've already fallen down it. You, Joe Consumer, are losing the very right to listen to music you've already bought.

The last piece that completes the absurdity is that the protection schemes the RIAA demands are impossible to implement securely. They'll always end up cracked. Thus, the RIAA, DVD Forum and MPAA are trying to make the act of reverse engineering these flimsy protections itself a crime.

None of this makes sense, and I'm simply not going to participate in the madness. It makes no engineering sense, it makes no legal sense, and my conscience won't let me implement a feature that takes reasonable rights away from people.

Now that Vorbis is close to being released, what kind of support would you like to see from free software developers?

First, integrating it into their own projects. I've gone through great pains to make sure the API is so easy it will make you cry. Of course, I'm saying that before I actually document it ;-)

Secondly, Vorbis could use a few more steady contributors.

There are pieces of the distribution (the psychoacoustics, signal processing, etc) that are deep wizardry... only a very few people would be able to help. Thankfully, there are many more pieces that any good hacker could wrap their mind around and make magic with.

I don't expect there to be much trouble attracting both kinds of support now that we're turning off our 'go away and come back later' field.

Many of the most visible open source projects fall into that "any good hacker" category. In what ways does "deep wizardry" free software differ?

A few of us at the Green Witch joked a few months back that everyone's first program [in Open Source] used to be a mixer; now it's an IRC client :-)

There are kinds of applications that are 'solved problems'. They use simple blocks that have been done before. You only need to spend the time and effort to put them together well. (Which is not a trivial thing. Doing software well is to be commended).

Then there are the applications that 90% of programmers look at and say 'I don't know how to do that' and so don't try it. Another 9% will spend a month or two butting heads with the problem, then wander away when it turned out to be much more difficult than expected.

Some of the silly fools keep tangling with the puzzle for ten years.

You can learn a lot in ten years... but spending that long is an accident. Any sane hacker would find an only mildly less sexy project that takes 1/10th the time.

No one has written Vorbis yet because it's hard. If I'd had any idea how hard and frustrating audio compression is at the time I started, I wouldn't have. It was only supposed to be a small, weekend-long hack as part of a more sexy package ;-)

But sometimes these hard things are needed. Should we just count on a steady stream of non-sane hackers?

Well, in my case, I saw that MP3 could exist, so I knew it was possible. In my case, I was an arrogant brat who didn't know better but couldn't admit it. Perhaps counting on that motivation is more practical :-)

Joel Becker of #gimp has just proposed registering theyallsuck.org as a blanket site for reviewing IRC clients, mailers, window managers, etc.

lol. I have been growing a pet peeve about people who write their own version simply because they couldn't be bothered to learn the software that currently exists. On the upside, the folks who do so generally learn a lot in the process.

I come from a rabid BSD crowd, and they generally can't believe I'm still hacking Linux. All I can manage to respond to most of the arguments is, "Yes... but they're learning!" Which is the real blessing of Linux. The other more 'elite' OS projects couldn't say that.

The OSS community has grown by a lot, and we're still assimilating the people who think that a few months with VB gives them what they need to write another GIMP. They'll get better. Of course, they'll be somewhat annoying in the meantime, but most of them are good folks :-)

Do you think that this atmosphere of learning has anything to do with the number of scarily smart people doing free software?

The current atmosphere is different than before namely because of the influx of new programmers. The scarily smart people have always been here; perhaps there are more of them here now, but I really don't know. I do know that I meet more scarily smart people in the community all the time, and that makes me feel good enough.

Joy's Law says, "Wherever you work, most of the smart people in the world work for somebody else." Do you think that quote really applies to free software?

We're not necessarily working hard as much as we're doing what we love. Of course 'scary smart' people may work very hard at having fun :-) Seriously, I think and sincerely hope that 'us and them' doesn't apply in OSS. People from other projects contribute to Ogg and I contribute to other projects all the time.

When I'm working on code, "beating the competition" isn't what's on my mind. Hackers get together with other hackers they like, and art results. Who was working for who? Maybe it doesn't matter.

Are you actively working on the video CODEC side of Ogg?

Actively, now: no. I don't have the resources, and this falls into the category of 'requires deep wizardry'. That doesn't mean that folks don't play with it (and we do have code), but this does not imply any sort of organized development of it yet.

"Folks" plural?

Yes, I slip in and out of 'me/we', but there is more to xiph.org than just me :-)

Vorbis and the other CODECs have a few steady hackers, and about a dozen casual contributors. That includes folks that are just in the same sphere... like the LAME developers, Icecast developers, old friends and so on.

Vorbis is 99% of the effort right now. The video CODEC sees disorganized 'poke and twitch' coding. The Ogg umbrella libraries get work now and then, but have been stable for a while.

The video project was a much easier project before you knew what you were doing, then?

Oh, I didn't play with video until I knew how hard audio was. Video is a bit easier than audio, but it's not trivial. I've had at least two hackers bounce off the video project pretty hard so far this year :-)

I'm surprised to hear that video is easier than audio. How so?

A couple of reasons; video compression is less mature than audio. There are still new, simple things to try with it that have a good chance of working. The brain also tolerates much more quality loss in video than audio. And so on.

Audio has been studied to death for 80 years; some of the masking papers I used for data in Vorbis are 40 years old. Video is much, much younger.

Back to cdparanoia. What's the philosophy behind that project?

cdroms are perhaps the lowest margin piece of hardware in modern PCs, and the media format doesn't lend itself well to accurate extraction. Getting skip-free audio off a CD is much harder than it looks.

Because of the CD format itself, or because the drives can't be bothered to get it right?

Both. Most of the problem, though, is buggy drives.

Yep. Regular CD players don't seem to have too much problem with it.

Sure they do. Pick up your discman and shake it. That simulates your OS going off and doing something else for a while. The CD is like a phonograph record; it's digital, but it actually is a spiral meant to be read from start to finish. If the CD is interrupted, loses sync, hits a bug, whatever... it just skips. Seeks, by the original CD spec, need only be within about 75 sectors of the intended destination. The data format doesn't provide for fine grained seeking like on a data disc; the CD literally is incapable of going back to the exact spot it lost track.

There's actually much more arcane detail involved; I have about four pages on the subject in my cdparanoia FAQ that goes into extreme, boring detail about why it doesn't just work.

There are an astounding number of failure modes in reading an audio disc. "It just is" even if it shouldn't be :-(

What kind of cool new features are going to be in Paranoia IV?

Paranoia IV is mostly intended to be a) portable and b) automatic (with meaningful error handling). For example, rather than 'unable to open cdrom drive', "you do not have /dev/sg support in your kernel" or "I need read/write permission on /dev/sg6 to proceed." I also need to add features like index/subcode extraction and ECC support for drives that have it.

But portability is the big issue.

Do you spend all your waking time hacking, or are there things you do away from the computer?

Right now, due to Vorbis, I spend all my waking time hacking, and I can't wait for that to end. Mountain biking, ultimate frisbee, badminton, hiking, hardcore strategy gaming... it occurs to me it's been too long since I was actively singing. It's two years now since my last Gilbert & Sullivan show and I haven't sung much since then.

...but you'll have to cut that list in half or more because my wife will certainly have a longer list of her own to subject me to :-) To be fair, those lists overlap.

I notice you use the phrase "open source software". Do you have a position on the whole open source vs free software rhetorical flamewar?

No, I use them interchangeably when the fine distinction doesn't really matter. My mixed use is mostly motivated by a) lack of alternative synonyms and b) desire not to sound like a robot.

Would you be willing to pose for nudenerds.com?

I haven't already? Well, dangit, who were those other people then?

Thanks very much for the interview, and best of luck with the Vorbis release and your other projects.

Thank you :-) Best of luck with Advogato; I'm definitely hoping for the best.


This sounds great, posted 4 Apr 2000 at 22:16 UTC by Radagast » (Journeyer)

Thanks for the interview, Vorbis is one of the projects I've looked at in the past (that is, looked at the webpages) and eagerly waited for it to get usable. I hope it's not just suitable for streaming and low- bandwidth use, but that it's consistently better than MP3 even at high bitrates (in the 160-256 kbps range). If it does, all we'd need is a good player for the most popular platforms, and I think it could move quickly to replace MP3 where it matters, that is, in the illegally copied music segment. It's huge, and it turns to new technologies really quickly. The moment the Napster people make Napster recognize data in Vorbis files, you know you've won.

All in all, good luck. If this works, it could be beautiful.

Good Projects, posted 4 Apr 2000 at 22:20 UTC by Gimptek » (Journeyer)

I'm glad that advogato did this interview. It's infomative and something that I probably wouldn't have seen anywhere else.
That's why I like advogato in the first place. Every other news site just has the same inteview. It's like the talk show circuit on TV. Everyone does the same shows.

Vorbis
That's sweet technology. I can't wait for someone to make a converter program. I can just imagine the trillions of cycles that will be used in the conversion process. Mind boggling. Now if only someone could come up with a distributed computing API and we could send each other blocks of data to process over the net.The net would be a giant rendering farm.

Think if someone took POV, made a standard way of breaking a scene file into 20-40000 chunks and just sending it to a central server which in turn would send out the scene file and the block numbers that they would do.

This would usher in a new commodity that everyone always had on their desks. Think of asking 20 friends for permission to render blocks on their empty cycles.

You could give out cycle permissions per cycle like gifts. E-mailing your friend an encoded "ticket" of sorts to grant them 5M cycles or something. This would all be automated in the API.

Of course this would need the utmost security. People would be stealing other's cycles. Making virii that steal cycles for rendering and other devious purposes.

This could be integrated with e-mail for those without a true ppp connection. One would get n blocks just as a mail message and sends back the processed data as an attachment when they go to check their e- mail again. This could be integrated into pine and even into evolution and that office that KDE is working on too.

Think of all the unused cycles of large corporations. Just gold sitting on their desks. If you had a corporation that had a fast network and blocks just everywhere. Think of the network traffic and the load balancing that would need to take place.

This could be a kernel level API. This would have to have a new protocol for the distribtion of calculations.

Get Intel to help you on this. They would love the fact that it would use the cpu even when the user isn't using it. This would give the cpu a shorter life and increase their profits so you can get a penny or two from them. AMD would want to say that their processor can crank the most blocks so they will want a piece of the action; more pennies there. Both will want to be involved so that they can optimize their processors and their network cards to act together. Future investments. More pennies.

Obviously, I am not against commercial money backing opensource projects. Obviously, I'm starting to babble and becoming incoherent. This is something that's been on the back burner of my mind for a while now. I'm working what you guys are thinking.

Distributed POVray., posted 5 Apr 2000 at 00:40 UTC by kelly » (Master)

Actually, I've been working on doing this for some time. It's not a hard problem, it's just not high priority. :)

Distributed POVRay.., posted 5 Apr 2000 at 00:59 UTC by darius » (Journeyer)

.. already exists..

At least for LAN's - PVMPovray :)

I don't have a current URL but it shouldn't be too hard to track down.

POV-Ray license considered evil, posted 5 Apr 2000 at 05:26 UTC by federico » (Master)

POV-Ray is *NOT* free software! The license is one of the most obnoxious around. I wish the POV-Ray team would get a clue and step out of its Compuserve mindset.

POVRay's evil license., posted 5 Apr 2000 at 06:38 UTC by kelly » (Master)

I agree with federico on the license issue; in fact, POVRay's anal-restrictive license is one of the largest blocking points to turning it into a proper distributed renderer -- it is prohibited (under their license) to separate the engine from the interface.

However, it is my opinion that their license is so defectively worded that it is probably unenforceable. I have seriously considered contemning the license and doing what I want to do with the code -- except for the other problem: POVRay's code is terrible.

The Applications, posted 5 Apr 2000 at 11:55 UTC by rakholh » (Journeyer)

This sounds like a cool format - especially if it lives up to the good quality at low bitrate end.

Anyway - since the codec supports streaming - How appropriate would it be to use this to for things such as Voice Over IP? I'm talking about something that can run on a modem (anywhere between 1 k/sec to 3 k/sec)?

Also - would this codec be appropriate replacement to .wav? A lot of games/applications still use .wav to play files (or some other 'raw' format). It would be cool if there could be some sort of codec out there for these types of games/applications to use.

The FAQ says it can go from 16 to 128 kbps? What about higher quality (say 192? what about 256)?

"I think so Brain, but where are we going to find an Open Source codec of *that* quality?", posted 6 Apr 2000 at 00:25 UTC by xiphmont » (Master)

>The FAQ says it can go from 16 to 128 kbps? What about higher quality (say >192? what about 256)?

Per channel. That translates to us testing developing modes for 256kbps stereo, yes.

I'll be working on voice modes for Vorbis, but Vorbis is not a classical speech codec, and as such, will never likely be as good as a compression designed specifically for speech.

Monty

More Monty Links, posted 26 Apr 2000 at 00:09 UTC by Acapnotic » (Master)

Just for links' sake, here's a link to the Slashdot discussion of this article.

There's also another interview with Monty by Dvorak of Real Computing.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page