Advogato: Blog for beland

Regarding the article posted recently on next-generation GUI design...

I've been involved in MIT's small contribution of Gnome development, mostly as a tester and commentator. When I talk to people outside the project (and sometimes inside, as well) I often hear comments like "ah, so it's just like Windows, then?"

On a fundamental level, this is an apt description. Though there is a lot of technical innovation and tremendous leapfrogging of Microsoft's technology underway now, that development is what you could roughly call "incremental." The user interface is fundamentally limited to a keyboard and a pointing device.

PDA interfaces seem like a radical departure from the traditional PC desktop interface. But really, it's just a degenerate case that requires an extensive reworking of the same paradigm. Shift emphasis to the pointer, simplify and modularize interface components so they'll fit on a small device. It doesn't really change the fundamental mode of interaction with the technology - point, tab, nudge, poke.

I'm really happy to see this incremental improvement happening, and it really does take a lot of engineering skill and effort; and it really makes a big difference at the end of the day.

We need something radically different for there to be a real "revolution" in GUI design. Or perhaps it's more of a revolution out of GUIs, and the original author of the article is really speaking of a mini-revolution that might be useful in the existing paradigm.

It seems to be quite obvious what the next step up from GUIs should be - voice and natural language interfaces. Humans have a built-in capacity for extremely high-bandwidth, highly expressive communication using natural language, especially speech. Natural language interfaces have the potential to be much more user-friendly, fluid, and efficient than conventional GUIs. Alas, it's much, much harder to process English commands properly than it is to respond to pokes and prods and taps of labelled buttons.

Alas, most applications demand higher performance from NLP (Natural Language Processing) than is attainable with current production systems. But the technology is improving every day, helped along by advances in hardware performance, by techniques in managing complexity, and by sheer brute force production of a working knowledge of the problem.

This process would, ironically, be helped along a great deal if we had better user interfaces with which to design our NLP systems. Along those lines, I have two particular projects in mind. I'm currently studying NLP at MIT, but I don't have plans for grad school. Maybe I'll end up in a position to actually do serious research in this area, but it's more likely it will be a hobby. Maybe I'm completely in left field and this will take centuries of slow progress to realize, maybe some company will come out with products to do this in the next 20 years, or maybe I'll invent something nifty in my copious free time. In any case, even the simplest of these systems can do fun and amazing things.

1.) Implementing an NL command and control interface.

The sci-fi vision of a computer that responds to natural-language requests - "Computer, what is the time?" or "Computer, play some Bach." - is extremely captivating and represents an obvious paradigm under which to create an extreme intuitive interface.

2.) Implementing an NL programming interface.

It is tempting to try to equate machine languages with natural languages. (When asked what languages I speak, I often tell people I speak Perl fluently.) Of course, they are very different -- but both are extremely powerful. It is certainly possible, for instance to describe, in English, what markup and style I want to use in my HTML document, in addition to actually dictating the content itself. Using a sort of batch processing approach (and the solution to problem #1) one could in a sense translate the description into the desired document. (Of course, a more efficient and user-friendly approach might be to convey the desired markup through an interactive dialog, which adds another layer of complexity.)

But what about arbitrary code in a Turing-complete language? Why can't I write a description of exactly what I want my program to do, and have that "translated" from English into Perl? Assuming this requires a semantic representation of the desired functionality, would this not also make it possible to generate a performance-optimized C version, say? Would this not speed up the software engineering process quite considerably? Would it reduce debugging to a process of refining the program's specification, perhaps in an interactive dialog? A prerequisite to answering those questions would be figuring out at what level of abstraction the English specification would need to be. Can the user merely describe the desired mappings from input to output, or perhaps the screens and buttons the users sees, and some notion of how they interconnect and what functionality they represent. Or would the programmer have to speak in pseudocode, at the level of functions and variables? How do knowledge representation and reasoning interplay with NLP in this application?

Older blog entries for beland (starting at number 1)