16 Aug 2003 raph   » (Master)

Clear and informative error messages

For any software which is to be considered mission-critical, one of the top priorities must be to produce clear and informative error messages when something goes wrong. It might be helpful to consider this the primary goal, with production of the correct result a pleasant side effect of the special case of no errors.

Of course, as maintainer of Ghostscript, I bear a great deal of responsibility for violating this principle myself. So, at the risk of the pot calling the kettle black, I humbly present criticisms of some existing free software projects, and suggestions about how to improve matters.

My most recent bad experience with cryptic error messages was a simple permissions problem in Subversion. A log file had 644 permissions, where 664 was needed. However, the actual error report looked something like this:

svn: Couldn't find a repository
svn: No repository found in 'svn+ssh://svn.ghostscript.com/home/subversion/fitz'

Trying to track the problem down, I ran svn locally on the machine hosting the repository, resulting in this error:

svn: Couldn't open a repository.
svn: Unable to open an ra_local session to URL
svn: Unable to open repository 'file:///home/subversion/fitz'
svn: Berkeley DB error
svn: Berkeley DB error while opening environment for filesystem /home/subversion/fitz/db:
DB_RUNRECOVERY: Fatal error, run database recovery

I ended up diagnosing the problem using strace, which did print out a clear and informative error message, once I found it:

open("/home/subversion/fitz/db/log.0000000002", O_RDWR|O_CREAT|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)

How did Subversion succeed in transforming such a clear error condition into such a confusing (and alarming) report? I think it's likely that the main culprit is the use of abstractions which do not support the error reporting goal as stated above. If you have a tower of abstractions, then it is essential for each abstraction in the tower to support it.

Of course, aside from Ghostscript, one of the absolute worst offenders for error reporting is the auto* toolchain. A simple problem such as a missing library often results in cryptic error messages, usually the fallout from incorrect macro substitution.

Macro substitution, while an appealingly powerful abstraction, is absolutely hopeless when it comes to mission-critical error recovery. In a typical scenario, you'd use macro expansion to rewrite your goal (create a suitable configuration file for building a program) into subgoals (such testing whether certain compiler flags work), and so on. However, when something goes unexpectedly wrong in one of the subgoal steps, it's all but impossible to trace that back up to the original goal - the only thing that remains is the expansion. Using procedures to break a goal into subgoals works in much the same way as macro expansion, but doesn't suffer from this inherent problem - when something goes wrong, the caller can look at the error returned by the callee. Of course, it's still the responsibility of the coder to actually check the return code and do something appropriate with it; all too often ignored.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!