The Cranky User: Could you repeat that?

Posted 4 Nov 2002 at 06:48 UTC by jbucata Share This

An excellent article from IBM developerWorks. The abstract:

One of the most common tasks computer users face is repetition -- performing the same task more than once. Computers are fairly good at this. So why do people spend so much time repeating tasks by hand? Often the interfaces they use are resistant to automation, and other interfaces are obscure or unavailable.

This article counts in my book (and, now, in my bookmarks) as a pro-Unix advocacy piece, even though it's not explictly trying to be--and IMHO that's the best thing about it.

Microsoft's VBA-enabled apps that let you record macros (mostly Office), record them as VBA--and Microsoft makes those macros Do The Right Thing. If you're creating an Excel chart, for instance, instead of "click at x,y" several times in succession, clicking on what you hope is a dialog box to create a chart, the generated code uses the Excel object model to do the work: You'll see something (off the top of my head) like "xlSheet.CreateChart(...)" with large numbers of parameters corresponding to the things that weren't defaulted when you went through the motions yourself.

I've never tried any sort of macro-recording feature in Free/Open Source apps, so I don't know how they stack up, but this is definitely one area where Microsoft does it right.

Link courtesy of

Do it in Qt, posted 4 Nov 2002 at 14:48 UTC by pfremy » (Journeyer)

I did that once with Qt program. It is a piece of cake to code. Qt already provides everything you need.

My program was coded in two hours. It would trap every event and was able to replay them later, by sending the event to the right widget. It would not be very difficult to add that to KDE.

The company of Kalle (a KDE developer) also has a tool, kdRunner that does exactly that, with the goal of recording test scenarii for a Qt application. Never tried it.

Expect, posted 4 Nov 2002 at 17:48 UTC by chalst » (Master)

In the case of command-line interfaces, Don Libes' `expect' program, which is based on tcl, is very useful, and I gives a good picture of what a scripting language for interacting with UIs should look like. BTW, Expect was an inspiration for Olin Shiver's `scsh'.

There is expect code, called autoexpect, that spawns a shell and records your actions. The code output by autoexpect is reasonably readable/predictable and so can edited like normal expect code, which makes it more sophisticted than usual `record-a-template' automation in two ways:

  1. it works well in distributed environments, and
  2. being layered upon expect, it offers the ability to mix scripting with recorded templates.

jwz on XEmacs, posted 5 Nov 2002 at 07:33 UTC by chalst » (Master)

Jamie Zawinski wrote a list of things missing from XEmacs that he thought should be there. Worth reading on what `record-a-template' paradigm macros should do:


Throwing technology at the problem, posted 10 Nov 2002 at 07:05 UTC by orph » (Master)

So for a long time now I've batted around the idea of having a system for building complex matching rules based on user performed actions. Hopefully allowing us to be able to create a decent, useful ruleset that could pop up URLs or documents pertinent to your current task or conversation, recognizing and allowing replay of frequently repeated tasks, automatically correcting words you tend to misspell, etc...

I plan to use gtk2's accessibility toolkit to have all basic user events forwarded to the rule-matching daemon. The events sent are basically mouse movement and clicks, key presses, text presented to the user, focus changes, and menu/toolbar selections. Using the accessibility toolkit allows all gtk2/gnome2 applications to generate these events without code changes.

Then it's a matter of determining patterns to find in this onslaught of events sent to the daemon. Simple rule examples are frequent misspellings such as typing 'rael' followed by three deletes followed by typing 'eal', or matching the last two words typed or shown with a name in your Evolution addressbook. Compound rules could then be built up based on simple rules, such as applying frequently misspelled character sequences to the last two words typed to see if they match with a name in the addressbook.

Rule matches would be broadcast to listeners. The first of which being a small notification gnome-panel applet presenting interesting links or asking "Did you mean this...?". Eventually I can forsee this being added to the applications themselves to provide automatic spellchecking, or dynamic document linking, or automating the transferrance of information between recently viewed documents, or which application to run with which document after receiving an email from a certain address... who knows.

The framework behind such a system should be near trivial to create and understand, and I think if I can come up with some interesting example rules that actually work, I can spark a decent amount of hacker interest . This is my hope anyways. Sorry to be so breif and abstract (I'll say more when I have an implementation), but what do you think?

AppleScript's been doing this since the early 90's, posted 10 Nov 2002 at 08:45 UTC by goingware » (Master)

AppleScript has been able to do this since the early 90's or so. Apple defined a protocol called "Apple Events" that basically drops protocol packets into a GUI application's event queue.

There is something called the Apple Event Object Model that allows one to write an Object Specifier that denotes some visible data in a user's document. An object specifier would be a binary coded form of "The first word of the second paragraph of the window named foo".

Most apple events are meant to be verbs, like Get Data and Set Data. If you send a Get Data to an Apple Event aware Mac application you can get back its document text, and if you send it Set Data events, you can programmatically edit its documents

Apple meant this mainly for scripting, and there is a recording feature where a program can respond to a real user action by sending itself an Apple Event that is not actually acted upon, but presented for the purpose of recording. You can then clean up the recording in the script editor to play them back.

But I did quite a lot of work to make interapplication spellchecking work through Apple Events, by leading a group that defined the Word Services Suite. There is a simple introduction to Word Services that also covers the object model in my MacHack '94 Word Services paper.

Be, Inc. later defined a similar GUI-manipulation protocol that was also meant for scripting, and I created a BeOS version of Word Services.

I would like to bring it to Linux, but am left with the problem of how to programmatically access data objects in a user's document. I brought this up with Richard Stallman a while back, and he suggested I use the CORBA interfaces that are exposed by Gnome, which I think would work.

However, Gnome wouldn't stricly be required, its just that an application would have to expose certain kinds of interfaces via CORBA for it to work, so I could get it to work with KDE or plain old X applications.

I think ultimately that CORBA would be a good thing for this as it is a cross-platform standard so there wouldn't need to be protocols that are nearly identical in function but distinctly different in implementation for each platform.

I know lots of you have been buying iBooks and TiBooks - try out the script editor. Try opening an application's AppleScript dictionary with the script editor to see all the apple events it understands.

AppleScript, posted 11 Nov 2002 at 23:56 UTC by orph » (Master)

Ya, I really like what apple has done with AppleScript. The problem I forsee with bringing it to open source is that few will do the work to make apps expose scriptable objects or high-level events. This is why I'm interested in using the Gnome accessibility framework; it should allow lots of contextual event generation with the smallest amount of supporting code. For instance, I can tell that the user is typing into a text box that is represented by a label named "State" contained in a properties page named "Address", without having any special code in the app to fire the events.

These kind of events are much more finely grained, and therefore more difficult to interpret. But I think this also leaves the most opportunity for interesting pattern analysis. Hopefully more high-level event generation as well, since events are not neccessarily tied to a particular application codebase.

Generic scriptability in open source platforms is probably the strongest currently in KDE with KParts and in Mozilla's app framework. For Gnome, there is no set of CORBA interfaces that apps always export, and many of the desktop programs which use CORBA or Bonobo are moving away from it.

Re: AppleScript, posted 12 Nov 2002 at 02:49 UTC by MichaelCrawford » (Master)

The problem I forsee with bringing it to open source is that few will do the work to make apps expose scriptable objects or high-level events.

Well, Apple faced the same problem when they created AppleScript, and their solution was to evangelize the developers. We could do the same thing. It did take some time for many Mac applications to become scriptable, and not all of them are even now, ten years later.

An advantage with open source is that we have the source. One thing you could do is start a development effort to add some scriptability to some applications and submit patches to the original developers. Most would welcome them, and I think to a large extent you would only have to start the process of making an application scriptable, and the original developers would carry it on.

The main advantage of the way script recording works in AppleScript is that what the recorder sees are high-level objects, like paragraphs and insertions, rather than pixels and streams of characters.

Scripts GUI Apps From the Shell?, posted 12 Nov 2002 at 07:45 UTC by jbucata » (Apprentice)

I'm used to doing the kind of shell scripting that the article mentions. This talk of scripting KDE and Gnome is interesting, but for me the ultimate touchstone of scriptability in any Unix environment is whether you can control it from the shell. I'm all for getting object-model-level scripting support for a large number of applications, but will it be possible to control them from the shell in a reasonable way once that's done?

Most Unix daemons already have some method of shell control, via special commands that typically interact with a socket or named pipe for you. It gets noticably harder when you start dealing with GUI elements.

The closest I can think along these lines is Netscape/Mozilla's handful of remote control command-line parameters. It only works in one direction, and it's not particularly thorough. Ideally there should be an easy way to expose the browser's object models to be manipulated by outside processes. You can already create macros to run within the browser environment. You might be able to get write-only access to the browser's object models by way of -remote "openURL('javascript:...')", but the difficulty of getting information back out makes the whole approach kludgy at best.

When I think of shell scripting, CORBA isn't the first thing that comes to mind. Maybe instead of CORBA we should be dealing with SOAP. Fundamentally they might not be all that different, but in shell scripting, the things you're most likely to care about are either files (usually text) or file impostors (pipes, mostly). To a shell scripter, schlepping XML files around through std{in,out} feels a lot more natural than trying to poke around with binary protocols (however one would do that), if it comes right down to it.

Perl, Python, and friends might be better suited for this sort of thing as far as languages and environments go, since you can write the appropriate libraries and use them easily. They bring in much-needed manipulexity, but lack in whipupitude since you can't create a Perl object and have it hang around in your shell while you work--but you can create an environment variable, or a temp file, or a named pipe with a daemon hanging off the end, or whatever, to keep things around in/for your shell. (Most people don't use Perl for their shell, but those that do presumably have much of this licked.)

I believe that if and when somebody in the libre/gratis software community comes up with a good (or good enough) method for shell users to access GUI applications from the outside, combined with the right "killer app" to make shell users want to use that method (and ultimately that's probably the biggest thing we're lacking, since good ideas and technology for the "how" are easy to come by these days), scripting will take off in a big way, more than what we've seen so far.

Comments, posted 13 Nov 2002 at 03:23 UTC by nymia » (Master)

There are many ways to implement scripting and one of them is dropping a grammar into it written in Lex & Yacc. For example:

         OPEN FILE 
Then spin-off the parsing routine to its own thread, making it wait for for any input. Commands can be sent in text format and there's no need to use binary, though.

Recording can also be implemented as well, just write out to a stream the equivalent statement. An output stream might look like the following:
OPEN /home/doc/sample1.txt
INSERT TEXT "test test test test test test test test test."
On the other hand, what gets implemented normally is a Message Loop. I've seen lots of implementation like these and they are very effective as well.

Binary data interchange?, posted 13 Nov 2002 at 06:03 UTC by MichaelCrawford » (Master)

If you have text mode scripting, how would you script the generation of a PNG graphic in Gimp that gets pasted into an AbiWord document?

With AppleScript, you would just send a Get Data to the Gimp, receive the graphic, and send a Set Data to AbiWord that contains the graphic as one of its parameters.

Personally, I've never understood why people like using text programmatically so much. I find binary so much easier to deal with. Parsing free-flowing text streams is so tricky, while parsing binary is not only easy but efficient.

mechanism is not the problem, posted 13 Nov 2002 at 18:30 UTC by dan » (Master)

Better scriptability of the working environment is one thing I'd really like to see in unix desktops. For a while I thought it was going to be a guiding principle of the GNOME desktop (first through Guile, then more lately through making everything exposed through CORBA) but they seem to be busy worrying about end-users (which is, in fairness, where the money is) and not caring so much about this stuff.

Some of the previous responses have talked about mechanism. I don't think that mechanism is the real problem - whether it's CORBA, SOAP, XML-RPC, ad-hoc-language-over-unix-domain-socket, or binary mush really isn't a problem - any of them will do the job acceptably well, provided everyone agrees on it. Personally I'd have gone for CORBA, because it's reasonably mature and seems to have the discover-what-services-are-running thing worked out, but I'm open to persuasion.

The problem is policy. The application writers must be persuaded to start thinking about useful scripting interfaces. Here are some examples:

I want a CD player that can be scripted to pause the current disc when the phone rings. I want an address book that can be scripted to look up records based on the caller id. I want a time tracker to notice this event, check in the address book to see whether the caller is a client, and mark the start of a billable period. Then my editor can report the buffer that I start editing soon thereafter and add that information to the time segment so that I have some idea what I was doing when I generate the invoice.

At the end of a day, I want an irc client that can tell when my screensaver cuts in and change nick to "dan_away". The time tracker can also notice, and add a note to say that I might have left ten minutes previously.

I want this to be no harder than writing shell scripts or emacs lisp currently is. I don't want to have to learn more than one additional language. I will bitch about it but use it anyway if it's a really silly language (there's not much excuse for this given that most of the mechanisms being discussed are language-neutral anyway), but I will object to having to learn one silly language per application that I want to control.

We don't need a killer app. We need the interfaces that will allow "power users" to create their own killer apps from the bits that we provide them

ARexx, anyone?

Scripting from the shell, posted 13 Nov 2002 at 19:13 UTC by mbrubeck » (Journeyer)

Systems like AppleScript are easily integrated with traditional Unix shell scripts. You just need a simple command-line utility to interface between the shell and the application scripting model.

Attila Mezei created such a utility for BeOS, called hey. It allowed you to send messages to running applications from the shell, and do things like:

$ hey Terminal set Title of Window [0] to "$TITLE"

If something similar hasn't been done for AppleTalk, that should be easy to remedy.

Any KDE rep around?, posted 14 Nov 2002 at 07:32 UTC by nymia » (Master)

Just wondering why nobody from KDE/Qt seems to comment on this one, though. It's an interesting topic where all sides must be heard.

I'm no KDE coder, but I think KParts is one mechanism of implementing scripting.

OK, there's the bait.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page