Why, oh why, is it so hard to build a telephony interface that just works like a sound card?

Rewriting my phone server to be purely event-driven has had fringe benefits - the design is much better now. But full duplex audio on the goddamn IXJ card still does not bloody work. At first, it starts dropping frames left and right, then it suddenly decides that it just doesn't want to wake up select(2) on write anymore. So the server's state machine stalls and everything goes to hell.

It's not a challenge anymore, it's just a never-ending nightmare. Nothing works, everything is broken, and there are no explanations to be found anywhere.

Spent two days at LWE. I couldn't justify skipping out on my ride back and paying for an extra night at the hotel so I came back Wednesday night. There was actually a Debian booth, just not where we expected it to be. My main contribution was netbooting and installing their Ultra10 loaner, then spectacularly failing to get X to work on it (damned flaky framebuffer drivers). But I got to meet wichert, branden, and lupus in person so that was cool.

I talked to the Transvirtual folks about speech a bit but felt kind of weird, like I was making a sales pitch. Anyway I hope we'll get to do stuff with them in the future, as PocketLinux actually looks like it will become a good, usable Linux platform for end-users on handhelds (the X-based stuff that people are doing is neat but I don't see it going anywhere - however, it would really be nice to be able to use X, since it isn't actually that big, and you end up not having to reinvent so many wheels...)

Also got to talk to Tridge about his (unfinished) PhD work on speech recognition, and finally meet mbp. Speaking of X, Tridge described how his speech recognizer ("bug") was able to send synthetic X events to clients, which for some reason I'd never thought of doing before - I'll have to try it out soon.

Hopefully, we (Cepstral) will soon be releasing a whack of Perl modules I've written, namely:

  • Speech::Recognizer::SPX - Perl extension for Sphinx-II
  • Festival::Client::Async - non-blocking Festival client module
  • Telephony::Phonedev - Perl extension for Linux telephony devices
  • POE::Component::SPX - POE component for speech recognition
  • POE::Component::Festival - POE adaptor for Festival::Client::Async

Together these should provide basically all you need to build speech interaction and voice control systems using Perl. I am excited.

The main thing I'm waiting on is the Sphinx2 0.3 release from CMU, which should happen soon. Oh, yeah, we also need a website first.

23 Jan 2001 (updated 23 Jan 2001 at 18:32 UTC) »

Cool. I got the Sphinx-II "continuous audio" module working with select(2) in Perl. So I can get rid of this horrible bit of code in my speech I/O framework, since I don't have to fork off a blocking process to do speech recognition anymore:

	# ARGH ... POE has messed with %SIG
	@SIG{keys %SIG} = ('DEFAULT') x keys %SIG;

The Sphinx-II "non-blocking" utterance processing interface is kind of broken... it processes only a single frame of data at a time, which is way less than the amount typically available to read from the audio device, and there's no explicit function to flush unprocessed frames (though you can just call uttproc_rawdata() with an empty buffer and blocking on :-) Fortunately, recognition is so much faster than real time that there's absolutely no reason to use non-blocking mode, even in a single-threaded program.

Once the 0.3 release happens I will volunteer to take a hacksaw to all the redundant and poorly-designed interfaces in Sphinx-II, fix them up, and properly document them.

In other news, I'm learning POE incrementally ... I've just taken the first step towards using it as more than just a convenient wrapper for select(2). Namely, I've taken my random collection of states handling the Festival server, Sphinx, audio I/O, and "dialog management" (such as it is - currently this is just "Hello World" and repeating back what the user says), and split them up into multiple sessions. Soon I'll take a stab at packaging them up into actual components.

And I just discovered that the ALSA emulation of the OSS interface is not quite bug-compatible. In ALSA, select(2) on PCM devices (including /dev/dsp) works as expected. With the kernel OSS drivers, you have to call read(2) on /dev/dsp before you can select it for reading, and if you start writing to it, even if your sound card is capable of full-duplex, you will no longer be woken up on read. Total fucking brain damage. Sigh...

Simple math time. 10ms timeslice. 30ms frames of data. full duplex plus out-of-band events. 480 byte buffer. 8kHz, 16-bit audio. Why did I think that signal-driven async I/O was a good idea for this phone server? Live and learn, I guess. Anyway, I need to be able to deal with audio channels in my select loop anyway, and all the queueing code can simply stay, sans locking cruft.

In better news, as it turns out, the Telex headset Just Works. Limited to 22kHz, apparently, but otherwise very nice (it does full-duplex wonderfully, and speech recognition performance is better than any other input device I've tried).

Unfortunately, my D-Link USB hub with built-in PS/2 adaptor Just Does Not Work (though Win98 has no problems with it). Control messages sent to it appear to result in timeout/CRC errors, which means that over-current changes for its built-in devices can't be cleared, which results in a never-ending flood of console messages and no actual data from said devices getting through. The Linux-USB mailing lists have been most unresponsive. So I guess I will have to bite the bullet, read the USB spec, and play with a sniffer after all.


As it turns out the USB headset/microphone that I bought was missing its USB interface module. The manufacturer has very graciously sent me a new one, but the tech support guy made some very discouraging noises about the possibility of using it with Linux (for the record, it's a Telex, and their other microphones actually work quite well with the USB audio driver).

Guess I'll be getting acquainted with the wonderful world of USB sniffers.

Burf. Also hurf.

Hacking telephony stuff again, trying to get all necessary and actually working updates (as opposed to gratuitous features and API changes) from the OpenH323 ixj drivers into Linux 2.4, with a fairly good degree of success - the driver is obscenely huge but actually really simple. It seems that PSTN ring detection works a lot better in the stock kernel driver. Some other PSTN-related stuff is a bit messed up though.

My phone server works pretty well as far as I can tell - I wrote a client interface for it whose design I rather like, and which I'll probably reuse in the future. Namely:

  my $c = Phone::Client->new({ Domain => 'UNIX',
                               Sock => '/tmp/.phoned' })
  	or die $!;
  $c->attach('/dev/phone0', 'PSTN');
  my $s = IO::Select->new($c->fh);
  while (my @s = $s->can_read) {
    foreach (@s) {
      if ($_ == $c->fh) {
	if ($c->inmsg_pending) {
	  while (defined(my $msg = $c->inmsg_dequeue)) {
And so on... Except of course I'm using POE instead of IO::Select. Next item on the agenda is to make a POE component (or wheel, I'm not sure - I need something that will demux the various message types and turn them into POE events) for it.

The plan is to extend the phone server to be a general purpose streaming audio server for various devices. Yes, I'm fully aware that this has been done ten-billion times before ... and guess what, they all suck!

Getting phone lines at the office today. Better bring my modem...

Sprint PCS:

  • We'll fuck up your billing, then disconnect your service without warning.
  • Oh, we counted your deposit as a payment. Oops.
  • You have no credit, therefore you can't pay with a debit card.
  • To do an electronic bank transfer, we need a cheque number.
  • Call back in 8 months, maybe we can help you then.

If you move to the US, do not attempt to get a cellphone until you have lived here for at least a year. Take it from me, it's no fun at all.

If you do decide to get a cell phone, do not get Sprint PCS. They are assholes of the first degree. I want them to die. Even Verizon must be better than them.

Because I have no credit history in this country, my business does not matter to them. Therefore, they feel free to do things like:

  • Disconnect my service the instant that there is an outstanding balance on my account, before I have even received a bill for said amount.
  • Refusing to let me pay by cheque or debit card (I must go to one of their phone centres and pay a $3 service charge, or go to Radio Shack and wait 48 hours - and I must supply an invoice, which, as noted above, I have not received).
  • Refusing to tell me why these arbitrary restrictions are being placed upon me.
  • Giving me a password for their web-based account management system which does not work.
  • Requiring a $125 deposit, requiring me to pay via a phone centre or Radio Shack (and then, in cash, at said location), then sending me a bill for this deposit nearly a month after I paid it.

I am fed up with them, but ... guess what! ... I cannot switch providers, because my phone only works with Sprint. Welcome to the golden age of free markets and competition for the airwaves.

Well, sorry folks, I'm not going to make it to linux.conf.au this year (hopefully, the organizers are reading the lca-speakers list and won't find out here first!)

In other news ... more hacking. It feels like I haven't really accomplished a whole lot. I may be guilty of trying to do things right the first time. Of course, I'm also actually documenting my code. Unfortunately the highest priority work right now is things like CGI scripts for the company website (mind you, they are talking CGI scripts :-)

The Sphinx-II Perl bindings (to be called Speech::Recognizer::SPX) will probably be released simultaneously with the next version of Sphinx-II. I also have some Edinburgh Speech Tools integration that is mainly intended for future versions of PointyClicky but should also be useful for other fun stuff (spectrograms, pretty pictures, etc).

I never thought I'd actually lose weight working at home, but that seems to be the case. At least, judging by the amazingly scientific pants-o-meter. I guess it makes sense, since I almost never eat at restaurants anymore. This might also explain why my kitchen is actually clean.

"Bell Atlantic is now Verizon", and they finally managed to install my loop, and I have DSL. I haven't had this kind of connectivity at home for ... probably the better part of a year now. Of course I have to abuse it for a little while ;-)

Napster seems to be marginally useful, so unfortunately I'm not ready to Burn All MP3s yet (though I've been steadily amassing Ogg Vorbis files of my vinyl/CD collection). I found the (fairly common but out of print) GISM bootleg CD whose name escapes me (with the tracks from "Detestation" and "M.A.N." on it) as well as the recent INFEST bootleg and of course the usual assortment of hard-to-find satanic heavy metal. Sounds like the new Catharsis album is rather good too, guess I'll have to buy it (being lame, I missed not one, but two shows on their tour ... one in Ottawa before I moved, then one in Pittsburgh after. Doh!)

Lots more Perl and XS hacking (fun) and some C (not as fun) at work. I'd forgotten how time consuming and aggravating it is to write networking and text-munging code (i.e. for text based network protocols) in C. But Perl and asynchronous I/O don't play nicely, and though POE is pretty awesome, it adds a lot of latency to the system which the telephony junk proved unable to cope with.

That said, I am rather proud of the design of the telephony server I've written to get around this problem. To satisfy the petulant telephony device it uses an async-I/O core driven by realtime signals (of course, I had to hack the driver to do proper siginfo based notification ...) then layers a single-threaded server on top of that to talk to clients and queue and dequeue events and data. Single threaded servers are good, people! Don't believe the hype!

It's unfortunate that AsteriskPBX is so keen on using threads for everything (grr), because otherwise we'd just use it since it already has exactly the architecture and hardware support we need.

Looks like sourceforge were moving when I had issues with them. So maybe I should edit out the comments below.

