Older blog entries for dhd (starting at number 92)

Made up a summary of all the issues with the telephony drivers and sent it off. Now I'm waiting for a reply, and have ended up debugging sound drivers instead. select(2) (and obviously poll(2)) breaks in interesting and different ways on different drivers. Somehow I am not surprised.

On a related subject, the VIA PLE133 chipset is really crap, and I suggest avoiding it. The integrated video won't do over 1024x768 without massive noise in the image, and the on-board audio uses one of those lame-o AC97 codecs that only does 48kHz. Suck.

Of course, if people who wrote sound applications actually knew that SNDCTL_DSP_SPEED returns a meaningful value in its argument (in particular the people who wrote the OSS backend for libao), that would also be helpful.

I guess I should probably just install ALSA on that box since its library will do all the necessary sample conversion. The kernel's VIA audio driver is really nice otherwise though, so I kind of fear what might happen.

In general, though, ALSA seems to have evolved to the point where it does a better job of OSS audio than the actual OSS modules. This is fairly impressive.

Also, pondering an enforced vacation from IRC, caffeine, or both. I find myself becoming more and more of a nasty, irritable, misanthropic, self-righteous bastard lately. Communication failures occur with frightening regularity.

20 Feb 2001 (updated 20 Feb 2001 at 02:08 UTC) »

Further progress ... my telephony gunk is now able to call me up on my cell phone and annoy me. Ring detection when dialing out turns out to be surprisingly hard, though - it seems that there's no way to tell if someone has picked up the phone in the middle of a ring until we fail to detect the next one. I suspect that voice data will fool the ring-detection filter, too. Perhaps the suggestion to use the speech recognizer to detect rings and pick-up is not so far fetched after all.

There is also a fair amount of black magic involved in getting the IXJ card to play DTMF tones correctly; the duration of the tones is magic (180ms on and 45ms off are the magic numbers supplied in the SDK, and other values tend to fail randomly), and also, it seems necessary to pause for at least a hundred milliseconds or so after setting the device off-hook, or it will fail to report tone state appropriately, causing none of the tones to be played correctly at all. Ah well. I keep telling myself "we build voices, not IVR systems" and that excuses all this ad-hockery.

I'm gaining a certain amount of sympathy for the idea of retiring to the countryside to grow fruit trees and "dealing in units of time no shorter than a fortnight". Yeah, a Jargon File reference (I think). So shoot me. (I have to be careful about saying that in the US, I guess)

I managed to fry the motherboard in my home machine when transplanting it into a new case, resulting in much cursing, swearing, weeping, and gnashing of teeth. The 'whisper' power supply that I bought is, in fact, very quiet, though. So presuming I don't fry the replacement motherboard, I should finally have a machine suitable for leaving on all the time and hence running mail, web, music, and wireless stuff from.

Still waiting for CMU to make another Sphinx release, as well as releasing the training tools and so forth.

I must admit that there are some things I like quite a lot about Red Hat 7. Being able to enable and disable inetd services with chkconfig(8) is pretty swell. Having POP3 and IMAP over SSL configured by default is as well (though of course one must still upgrade stunnel to a non-vulnerable version). It's a bit frustrating when it takes longer to download and install all the urgently needed updates (over a T1, natch) than it does to install the distribution itself, though.

I'm still regretting that I failed to snag a Conectiva 6.0 CD at LinuxWorld. By all accounts it sounds like the best of the RPM-based systems yet; they seem to have their heads screwed on straight with regard to security (shipping BIND in a chroot jail by default, for instance) and upgradability (well, if their APT port is any good, at least).

That said, I'm still waiting for one of the RPM based distributions to get rid of Sendmail as the default MTA in favour of Exim or Postfix... then I might actually consider using one on my own machines.

I actually sent a piece of snail-mail today that was not a parcel or rent cheque. Trying to hook up again with yet another old friend who has managed to avoid the rise of the Internet entirely, it seems. I keep half-heartedly searching people's names on Google wondering if they might have resurfaced on-line, but haven't had much luck.

Well, of course, after losing all hope, I finally get the damn phone server to work. Of course, I spent a few hours looking over logfiles wondering why all the messages were apparently getting delivered in strange orders until I realized that some of my debug printf()s were going via stderr and others via stdout... Anyway it is doing real full-duplex now, the network protocol bits work, etc, and I am happy.

Now all I have to do is adapt it to work with soundcards, which should be a lot easier since they do sane things like, duh, actually return 0 from read(2) and write(2) if their buffers are empty/full (respectively).

Also, fixed the configuration stuff for my Edinburgh Speech Tools XS stuff so it will actually compile on other people's machines, which is the first step towards finding an actual use for it. And finally started building a website for the company ... so I'll have to take "I work for a company with no website" out of my web pages :(

Why, oh why, is it so hard to build a telephony interface that just works like a sound card?

Rewriting my phone server to be purely event-driven has had fringe benefits - the design is much better now. But full duplex audio on the goddamn IXJ card still does not bloody work. At first, it starts dropping frames left and right, then it suddenly decides that it just doesn't want to wake up select(2) on write anymore. So the server's state machine stalls and everything goes to hell.

It's not a challenge anymore, it's just a never-ending nightmare. Nothing works, everything is broken, and there are no explanations to be found anywhere.

Spent two days at LWE. I couldn't justify skipping out on my ride back and paying for an extra night at the hotel so I came back Wednesday night. There was actually a Debian booth, just not where we expected it to be. My main contribution was netbooting and installing their Ultra10 loaner, then spectacularly failing to get X to work on it (damned flaky framebuffer drivers). But I got to meet wichert, branden, and lupus in person so that was cool.

I talked to the Transvirtual folks about speech a bit but felt kind of weird, like I was making a sales pitch. Anyway I hope we'll get to do stuff with them in the future, as PocketLinux actually looks like it will become a good, usable Linux platform for end-users on handhelds (the X-based stuff that people are doing is neat but I don't see it going anywhere - however, it would really be nice to be able to use X, since it isn't actually that big, and you end up not having to reinvent so many wheels...)

Also got to talk to Tridge about his (unfinished) PhD work on speech recognition, and finally meet mbp. Speaking of X, Tridge described how his speech recognizer ("bug") was able to send synthetic X events to clients, which for some reason I'd never thought of doing before - I'll have to try it out soon.

Hopefully, we (Cepstral) will soon be releasing a whack of Perl modules I've written, namely:

  • Speech::Recognizer::SPX - Perl extension for Sphinx-II
  • Festival::Client::Async - non-blocking Festival client module
  • Telephony::Phonedev - Perl extension for Linux telephony devices
  • POE::Component::SPX - POE component for speech recognition
  • POE::Component::Festival - POE adaptor for Festival::Client::Async

Together these should provide basically all you need to build speech interaction and voice control systems using Perl. I am excited.

The main thing I'm waiting on is the Sphinx2 0.3 release from CMU, which should happen soon. Oh, yeah, we also need a website first.

23 Jan 2001 (updated 23 Jan 2001 at 18:32 UTC) »

Cool. I got the Sphinx-II "continuous audio" module working with select(2) in Perl. So I can get rid of this horrible bit of code in my speech I/O framework, since I don't have to fork off a blocking process to do speech recognition anymore:

	# ARGH ... POE has messed with %SIG
	@SIG{keys %SIG} = ('DEFAULT') x keys %SIG;

The Sphinx-II "non-blocking" utterance processing interface is kind of broken... it processes only a single frame of data at a time, which is way less than the amount typically available to read from the audio device, and there's no explicit function to flush unprocessed frames (though you can just call uttproc_rawdata() with an empty buffer and blocking on :-) Fortunately, recognition is so much faster than real time that there's absolutely no reason to use non-blocking mode, even in a single-threaded program.

Once the 0.3 release happens I will volunteer to take a hacksaw to all the redundant and poorly-designed interfaces in Sphinx-II, fix them up, and properly document them.

In other news, I'm learning POE incrementally ... I've just taken the first step towards using it as more than just a convenient wrapper for select(2). Namely, I've taken my random collection of states handling the Festival server, Sphinx, audio I/O, and "dialog management" (such as it is - currently this is just "Hello World" and repeating back what the user says), and split them up into multiple sessions. Soon I'll take a stab at packaging them up into actual components.

And I just discovered that the ALSA emulation of the OSS interface is not quite bug-compatible. In ALSA, select(2) on PCM devices (including /dev/dsp) works as expected. With the kernel OSS drivers, you have to call read(2) on /dev/dsp before you can select it for reading, and if you start writing to it, even if your sound card is capable of full-duplex, you will no longer be woken up on read. Total fucking brain damage. Sigh...

Simple math time. 10ms timeslice. 30ms frames of data. full duplex plus out-of-band events. 480 byte buffer. 8kHz, 16-bit audio. Why did I think that signal-driven async I/O was a good idea for this phone server? Live and learn, I guess. Anyway, I need to be able to deal with audio channels in my select loop anyway, and all the queueing code can simply stay, sans locking cruft.

In better news, as it turns out, the Telex headset Just Works. Limited to 22kHz, apparently, but otherwise very nice (it does full-duplex wonderfully, and speech recognition performance is better than any other input device I've tried).

Unfortunately, my D-Link USB hub with built-in PS/2 adaptor Just Does Not Work (though Win98 has no problems with it). Control messages sent to it appear to result in timeout/CRC errors, which means that over-current changes for its built-in devices can't be cleared, which results in a never-ending flood of console messages and no actual data from said devices getting through. The Linux-USB mailing lists have been most unresponsive. So I guess I will have to bite the bullet, read the USB spec, and play with a sniffer after all.

Interesting.

As it turns out the USB headset/microphone that I bought was missing its USB interface module. The manufacturer has very graciously sent me a new one, but the tech support guy made some very discouraging noises about the possibility of using it with Linux (for the record, it's a Telex, and their other microphones actually work quite well with the USB audio driver).

Guess I'll be getting acquainted with the wonderful world of USB sniffers.

Burf. Also hurf.

Hacking telephony stuff again, trying to get all necessary and actually working updates (as opposed to gratuitous features and API changes) from the OpenH323 ixj drivers into Linux 2.4, with a fairly good degree of success - the driver is obscenely huge but actually really simple. It seems that PSTN ring detection works a lot better in the stock kernel driver. Some other PSTN-related stuff is a bit messed up though.

My phone server works pretty well as far as I can tell - I wrote a client interface for it whose design I rather like, and which I'll probably reuse in the future. Namely:

  my $c = Phone::Client->new({ Domain => 'UNIX',
                               Sock => '/tmp/.phoned' })
  	or die $!;
  $c->attach('/dev/phone0', 'PSTN');
  my $s = IO::Select->new($c->fh);
  while (my @s = $s->can_read) {
    foreach (@s) {
      if ($_ == $c->fh) {
        $c->read_more;
	if ($c->inmsg_pending) {
	  while (defined(my $msg = $c->inmsg_dequeue)) {
            handle_msg($msg);
        }
      }
    }
  }
And so on... Except of course I'm using POE instead of IO::Select. Next item on the agenda is to make a POE component (or wheel, I'm not sure - I need something that will demux the various message types and turn them into POE events) for it.

The plan is to extend the phone server to be a general purpose streaming audio server for various devices. Yes, I'm fully aware that this has been done ten-billion times before ... and guess what, they all suck!

Getting phone lines at the office today. Better bring my modem...

83 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!