Recent blog entries for habes

Last week I was complaining how complicated terminal emulation is, but it turns out the situation is much worse than I thought.

Here is the state diagram required to parse VT escape sequences robustly.

Something needs to be done. We need a new text-mode interface network protocol that is much, much simpler. Emulating VT100s in 2005 is nuts.

The current suckiness of terminal technology

Text-mode interfaces rock. It's too bad that the technology surrounding them is so crufty and antiquated.

The vast, vast majority of terminals currently in use are software terminals: xterm, eterm, Terminal.app, putty, etc. And yet, terminal technology is centered around emulating and supporting the interfaces of hardware terminals long since deceased: vt100, vt102, etc.

Say it's 2005 and you are implementing a terminal emulator from scratch. You want to be conscientious and pick a standard terminal type that you can implement robustly. You're in for a rude awakening: the truth is that pretty much every software terminal emulation program in existence is a mish-mash of whatever escape sequences the authors felt like implementing.

For example, Eterm claims to be a vt102 emulator. But it certainly doesn't implement:

Invoke Confidence Test (DECTST)

ESC [ 2 ; 1 y

Power-up test. Terminal resets and performs power-up test.

And why should they? Half the escape sequences for these historical terminals make no sense for software-based terminals. And yet, you're never sure what escape sequences client applications will use.

And for all these obselete escape sequences that no one cares about, we still don't have an escape sequence that will change the text color to an arbitrary RGB value. We're still in 8-color world (not even 8-bit, 8 color), even though 99% of "terminals" out there today are capable of doing better.

Because there is no reasonable standard terminal type for software terminal emulators to support, authors of this software just do whatever works. Many programs put the name of the program in $TERM (for example, xterm and screen), which basically demands that everyone else add that program to their terminfo/termcap. Or they'll reuse existing bastardized term types like xterm-color (Terminal.app does this), even though they almost certainly don't support all the same escape sequences.

The story is just as bad on the client side. Your only hope for writing an interoperable text-based application is to use either terminfo or termcap, which are huge databases of historical terminals that no one owns but that the UNIX community won't let go of. Instead of hard-coding escape sequences in your program, you look up the escape sequences for abstract capabilities (like "move the cursor") and use those. That will work, as long as the user actually set his/her TERM variable correctly, and their terminal type is actually in your termcap/terminfo file.

In practice, most programs give up on that idea and use a library like ncurses to do that work for them. You might hope that ncurses would make your life as a terminal application author simple, but you would be wrong about that. Ncurses has 257 functions in it, most of them with names like mvaddchnstr().

Ncurses isn't horrible, but it's not particularly nice to use, and it doesn't expose the full capabilities of the underlying terminal in some cases. For example, many software terminals support an extension where you can set the foreground or background color to -1 which means "the default color" (ie. the color you see before a program mucks with the colors). This effectively allows 81 color combinations (8 colors plus the "default", so 9 foreground times 9 background colors). But ncurses only supports 64 color pairs.

Then there's the whole mess of pseudo-terminals. If you are like me, you might wonder at first why pseudo-terminals are necessary. Why not just fork a shell and communicate with it over stdin/stdout? Well, the bad news there is that UNIX special-cases terminals. They're not just full-duplex byte streams, they also have to support some special system calls for things like setting the baud rate. And you might think that a change of window size would be delivered to the client program as an escape sequence. But no, it's delivered as a signal (and incidentally, sent from the terminal emulator as an ioctl()).

Of course, you can't send ioctls over a network, so sending a resize from a telnet client to a telnet server is done in a totally different way.

As you can see, it's a big mess. This is unfortunate, because text-mode terminal interfaces can offer a lot, especially to us computer programmers.

Dreaming of Solutions

The problem is pretty clear. Solutions are a bit harder to come by. Here are some ideas.

  1. Authors of software terminal emulation programs need a reasonable standard they can target. It should be a very small set of escape sequences -- just the basics of movement and setting character attributes. Realistically, for such a thing to gain any traction at all, it would need to be a subset of either vt102 or more likely xterm. Hopefully, such a thing could be integrated into ESR's canonical termcap/terminfo databases, so that existing clients could communicate with them without sending escape sequences that they can't understand.
  2. Authors of text-mode applications could use a library that is a lot slicker than Ncurses. Perhaps it could be built on ncurses for now, with the option to leave all that cruft behind later.
  3. There needs to be a standard way of making extensions to the existing set of escape sequences. For example, there should be an escape sequence extension that supports specifying an arbitrary color in RGB. Clients should be able to ask terminals what extensions they support, instead of having a static TERM type to decide what is supported. Terminals should know how to ignore escape sequences they don't implement.
  4. Everything should be migrated to use XML.
OK, I was just kidding about that last one. :)

My Related Pet Project

I'm creating a text-mode environment that seeks to integrate all your core text-mode apps (terminal, mail reader, editor) into one uber-scriptable environment. Think Emacs, but object-oriented, and written in Ruby.

It's not ready for prime-time, but it can already multiplex terminal windows (like screen does) well enough to run VIM. It's called silkscreen

4 Mar 2004 (updated 4 Mar 2004 at 21:10 UTC) »

CodeCon

I was really happy with our Audacity talk at CodeCon 2004. We had the last slot because I had to fly in that morning, and I feel like we ended the conference with a bang. Dominic's demo went over well; his final gesture was to record a voiceover on top of the introduction to a "demo" jazz track, adjusting the volume envelope so that he could be heard over the music. He played it all back, and I saw looks of satisfaction and amusement in the audience while they applauded. It wasn't anything revolutionary, the point was that he did something cool and effective with a minimum of effort. Audacity is genuinely useful to people.

I was really disappointed that I had to miss the first two days. I heard there were some really great presentations. I would have especially liked to see the version control software demos.

It was good to see Dominic and mbrubeck again. I also got to meet another Audacity developer, Vaughan Johnson.

CSS

It seems to me that a major reason that people are still using table-based layouts is that tables are such a natural way of arranging things. Look at any web site and you'll see how it can be naturally broken down into tables.

So why on earth does CSS ditch this model and instead work with boxes that are logically unconnected? It is absolutely ludicrous to me that a simple three-column layout, with two of the columns static and one dynamic, is considered a holy grail with CSS. The code is complex and counterintuitive (hence a holy grail to find). Two of the columns have to be absolutely placed and sized, and the middle column has to have margins big enough to not overlap the other columns (of course the three boxes don't interact at all, nor are they aware of the other's existence). With a table-based layout it is dirt simple: one row, three columns. Of course the resizing will work correctly. Of course you'll never have two boxes inadvertently overlap. That's just the way tables work, and it maps pretty well onto the way design really works.

I believe that CSS got its positioning model fundamentally wrong. What stylesheets should provide is a way to assemble block elements like <div> into tables (grids) in a flexible way. The three-column layout should be as simple as saying:

make a row consisting of (left-to right) div "leftcolumn" with absolute width 200px, div "centercolumn" variable width, and div "rightcolumn" with absolute width 200px. Done.

There's no reason browsers can't do this; this expressive power is already available with tables. It's just not possible to separate this kind of layout from the HTML as CSS is designed to do.

Certification weirdness

What happened to my certification? I used to be Journeyer, I am certified "Journeyer" by two other Journeyers and one Master, so why am I not certified at all any more?

I intend to post an article soon. If you could take a few minutes and decide whether you think I am a Journeyer based on my work on Audacity and PortAudio, I would appreciate it. I also wrote FLAC tagging support for Rhythmbox. Thanks.

Audacity

I'm really happy with what I've managed to accomplish on Audacity in the last month or so. Everyone is starting to itch for a new stable release, so I have been working on trying to knock the audio i/o into shape in time for 1.2. A few months ago we moved to a thread-based model for audio i/o that decreased latency and improved resourse utilization, but there were some kinks that it took some time to discover and diagnose.

Audacity can now build against PortAudio v19. PortAudio v19 is still unfinished, but it will have the ability to support OSS, ALSA, and JACK in a single build (all natively)! I was hoping this would be finished in time for Audacity 1.2, but since it won't our audio i/o is #ifdef'd to support either v18 or v19.

Looking for Summer Work

I would really like another internship this summer. Initially I placed a lot of eggs in the RealNetworks basket, since I figured my experience with Audacity would make me a prime candidate. But Real is a wall of silence: I never heard a word back from them. I need to pursue it more proactively.

I also need to think of other places where I could find work. I would like to stay in the Seattle area if possible (though I would willingly relocate like I did last summer), and hopefully I could find something having to do with audio or multimedia in some form.

Sometimes I worry that choosing work in a field similar to the work I do on free software will force me to close off part of myself and my knowledge to the free software community. I half-heartedly approached employees of Syntrillium (makers of CoolEdit) last week at a presentation and asked if they hired interns. When I said I was interested in programming they blew me off and said that wouldn't really be possible. But what if they did have internships available? It would be extremely similar work to what I do on Audacity, would I be forced to sign NDAs and wall off the insider knowledge of Syntrillium products to the work I do freely?

It seems difficult to find employment opportunities that would utilize my free software experience.

I desperately wish I had infinite time to be able to contribute to any free software project that interested me. I am constantly exposed to really neat projects that I want to learn about and help improve. I want to do everything I can to advance the power, usability, and range of available free software.

Every time I feel this urge, I remind myself that the most productive I can possibly be is when I am working on a project I already know well. In other words, the best way to achieve my goal of furthering free software in general is to do lots of good work on Audacity and PortAudio. Meanwhile, the people who have found different niches, whether it be in the libraries that do all the grunt work or in the desktop environments that present the gateway to individual applications for the user, are simultaneously advancing the state of their pet projects also.

Not that I have the time to code much of anything these days, with school and all...

Another thing I constantly have to remind myself is that the best way to solve a problem is usually not to start with a clean slate and attempt to make an ultimate design. Though it's much less sexy, the better solution is to build on what has already been done. Every time I tell myself "we need to trash X!! It's way too hard for users to configure, the fonts suck, and you can't change resolutions," I step back and remind myself that improving X is probably the better long-term solution. My sense of aesthetics makes me want to use something like DirectFB instead, since it's clean, sleek, and new, but it would take so much to get to where X already is today.

Another place I struggle with this is all the way down to the Linux kernel. "How is my mother ever going to use Linux if configuring the network requires decyphering driver names like 'ne' and 'rtl8139?' How is Linux ever going to become robustly GUI-configurable as long as we keep using text configuration files? How far can open()/read()/ioctl()/write()/close() really take us??"

Free software is really becoming kickass in some cases. I prefer Galeon to any web browser available for Windows. Though I don't use them much, I continue to hear that Evolution and Gnumeric and Gnucash are kicking some serious ass. But how is my mother ever going to use free software if there's no free platform (meaning kernel+userspace+standard APIs) that works straight out of the box, without configuration pains?

And speaking of standard APIs, I think lack of them is a problem on Linux. How does an application play sound on Linux? Well, it's very simple! OSS will probably work, ALSA is better if it's available, JACK is best for low-latency, inter-app work, but ARtS if you're using KDE and ESD if you're using Gnome!

Challenge 2: set the resolution to 800x600 and play a movie file fullscreen. There's no obviously standard way to do this on a basic Linux/X11 box. Stuff is written for SDL, Allegro, ggi, even svgalib. Some may work on your box, others might not. This is where cathedral-engineered OS's have a one-up on us: there are standard ways to do things like this that will always work.

What Linux needs is something like LSB, but much more far-reaching.

And perhaps what I'm looking for already exists in Gnome and KDE. These are real platforms, in that a Gnome application will use the same set of APIs that any other Gnome application will. And if everything was a Gnome app or everything was a KDE app everything would be peachy.

But they're not, so we're still stuck with a situation where some apps may play sound fine, while others pop up a dialog box saying "esd not running."

That's enough for now. I want so badly for free software to succeed on laypeople's desktops. I think we're on a good track, we just need time to fill in the gaps and polish everything.

15 Jul 2002 (updated 15 Jul 2002 at 22:36 UTC) »
Audacity is getting major publicity this week!

First off, we hit 1,000,000 downloads, which are mostly from the last year and a half. That was exciting.

Then Audacity was mentioned on the TechTV program "The Screen Savers." I got it on tape this morning: an incoming caller wanted to know a way to do noise reduction on mp3s, and the hosts recommended Audacity and brought up the web page. That was even more exciting.

Then today I noticed Audacity was reviewed in the Washington Post! That was the most exciting of all. (well, besides sharing a front page slashdot story a few weeks back).

Please accept my apologies for bragging, but these are exciting times.

Audacity 1.0 is released! This is exciting for so many reasons. The two biggies:
  • we can forget about the 1.0 branch now. The 1.1 branch is so much better that maintaining 1.0 was becoming a pain.
  • a bunch of subprojects that we were putting off for the release are now progressing full speed ahead. The most exciting one is a split of the GUI and audio engine modules: this allows us to write rigorous tests to hammer at the audio engine code, as well as opening up the opportunity to implement alternate GUIs on top of libaudacity.

I finally have internet at home, which means I can do all the things I've been planning: a PortAudio implementation for JACK and/or ALSA, a media player for PicoGUI using gstreamer in Python, and who knows what else. Now that I have weekends the sky is the limit. Unfortunately I've gotten too used to working 10 and 11 hour days, so I don't have too much time during the week...

24 Apr 2002 (updated 24 Apr 2002 at 10:12 UTC) »

This week I took time that I didn't have to code two long-overdue features for Audacity: Ogg exporting and command-line exporting. There was no good reason why I hadn't written these features yet and I got tired of Audacity not having them, so I caved and tabled all my homework (including "compose a three voice fugue") to write these new features.

Ogg exporting is working nicely, and though command-line exporting works fine for me on my little-endian Athlon, I forgot to add code for byte-swapping the WAV header. Matt wisely suggested I dump my custom WAV code and just use libsndfile like the rest of the PCM exporting code does. Though it won't be as efficient since we have to copy the data in and out of libsndfile's buffers, the robustness and reliability will be worth it.

The gap between Audacity's 0.9 branch (which we are releasing from these days) and the 1.1 branch (which we are hacking on) is becoming wider all the time. The 1.1 branch has so many significant improvements over the 0.9 branch that I wish we could dump the 0.9 branch completely. But then we would lose some of the freedom to make drastic changes on a regular basis, so it's probably a good thing in the end.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!