Unix Errors are Stupid

Posted 21 Mar 2000 at 19:07 UTC by Ankh Share This

The Unix system-level error reporting mechansm (errno and friends) leads to programs with poor error recovery, misleading feedback to users, and helps to perpetuate a culture of system-based rather than task-based interaction.

For those of us who remember "?" as being the editor's only error, yes, this is an improvement, but...

It doesn't have to be this way.

Today, a C program that runs on Unix interacts with the world though system calls. All input and output, memory allocation, file access, starting and stopping, ultimately boil down to system calls.

System calls either work or fail. If they fail, they set a thread-local variable, errno, to indicate the error.

It is possible to map errno into a human-readable string that gives some indication of the possible problem. It is not so easy for the program itself to interpret the problem. Ad-hoc system-dependent code must be written to handle each case.

Here some examples to illustrate this.

ENOENT
A file or directory did not exist ("No such file or directory"). A call to open() a file that does not exist would generally produce this error, but the same value would be returned on an attempt to open a file in a directory that did not exist. open("/temp/not-there/boy", O_RDONLY) is indistinguishable from open("/tmp/not-there.txt", O_RDONLY).

Recovery in the first case might be to create the directory, or to check that software has been installed correctly.

Recovery in the second case might be to create the file; but trying to create a file in a directory that isn't there will lead to exactly the same error.

The error message is too generic.

ENOTTY
The isatty(fd) function uses the ioctl() system call to check whether the file descripter fd refers to a terminal or a regular file. The standard I/O package stdio, uses isatty() to determine whether to line-buffer standard output.

As a result, the first time you call printf() to a non-terminal file, errno is set to ENOTTY.

The following code will always print "not a typewriter" on such systems, if the user tries to save the error messages to a file:

void fatalError(char *message)
{
    extern char *progname;
    if progname) {
        fprintf(stderr, "%s: ");
    }
    fprintf(stderr, "fatal error: ");
    perror(message);
    exit(1);
}

The error message mechanism decouples the cause of an error from the reporting.

EEXISTS
The shell command mkdir /bin/sh/x produces:
  mkdir: cannot make directory "/bin/sh/x": Not a directory
mkdiObviously I know it's not a directory, that's why I'm trying to create it with mkdir in the first place. The real problem is that /bin/sh is a regular file, not a directory.

The error messages is at the wrong level of detail, and does not relate at all to the user's task.

Solutions

A few years ago (1988) I wrote a small library of wrapper functions for Unix system calls such as unlink() and open(). The idea was to attempt to diagnose common problems and produce more helpful error messages than perror(3).

An example error message was:
  etest: could not open file "/bin/sh/boy.c" for reading
  etest: -- "/bin/sh" is a file, not a directory, and cannot contain "boy.c"

This library no longer exists in a useful form (I can mail what I have to anyone who wants it, but there's not much left. Maybe I was testing the unlink wrapper too much :-), but in any case it predates widespread implementation of ANSI C.

The library had to be hand-crafted for each platform, which was a pain.

Much of the information is split between the manual page and the source code for the system call or library function concerned.

Maybe if the manuals used XML to document the error returns, I could write something to process man pages and automatically build the library. As it is, the inconsistent use of troff macros makes that difficult.

Maybe if that XML was available at runtime, I could do something even smarter, and check it on the fly.

But this is a lunatic idea!
I'm talking about interpreting the online documentation on the fly so that a program can understand an error return that's clearly inadequate.

Why not fix the API instead?

error_t e = Eopen(
    &result_fd,
    "filename",
    "short description"
e.g. "configuration file"
    O_RDONLY, /*
or whatever */
    [mode]
);

This might be improved by passing down a task context stack, so that an error message can relate the user-level task (run web browser), the application's task (load user interface definition files) and the system error (/proc/xml not found).

Note that exception mechanisms may make this problem worse, not better, by decoupling the cause of the error (where it's easiest to generate a more precise report) from the task (where the information is needed).

 

When I've offered to donate code in the past, I now see that I've offered it to the wrong people - people for whom ed's famous "?" error message was satisfactory. But that was over ten years ago. Maybe times have changed? Is it time to build something better?


exceptions, posted 21 Mar 2000 at 19:47 UTC by graydon » (Master)

I do not agree that exceptions are worse; they are completely different animals from error codes. an exception decouples the error's recovery procedure from its detection phase. You are more than welcome to write a wrapper library around the standard C libraries which diagnoses errors in a detailed way, but it is still useful in many cases to throw such well-detected and well-described error states up the call stack.

it is especially useful in library code, where the library author doesn't even know how the library is being used, and how the user wants to recover.

one of the biggest problems with exceptions is that java and C++ both implement then in a broken way. java checks exceptions at compile time but does not provide any programmer control over resource release, so there's no way to be sure that leaving a stackframe via an exception will clean up all the things aquired in that frame. while C++ does provide the means to release things properly, it fails to properly enforce the compile-time checking and also fails to enforce the cleanup by permitting non-smart pointers to unreclaimed memory. proper use of C++ exceptions requires control over all the source you depend on and all the programming practises of your contributors: not likely.

the "errno" solution is popular, not because it is pretty or well thought out, but because most people code programs optimistically anyway. same reason unchecked casting is a popular solution to generic programming, even though there are safe ways of doing it. enforcing safety is never going to be popular. it's like enforcing documentation. imagine a language which wouldn't compile a function, class or module without docstrings!

the concept of a "task context stack" is identical to the caller's stack of exception handlers, or rather set of exception handlers embedded in the call stack.

The Why of the API, posted 21 Mar 2000 at 20:06 UTC by idcmp » (Journeyer)

Basic API calls cover What/Where you are trying to do, and the documentation discusses How they are going about it, (Who is an environmental factor), but in many cases "Why" isn't covered anywhere.

Why are you allocating this memory? Why are you opening this file? Why are you binding to this socket? Why are you locking this fd?

I would imagine the only reason having an API call that can be told "Why" you are doing things hasn't happened is that it's a recipe for over-engineering, and that normal design-by-committee issues occur and eventually clobber such projects.

Sometimes I wish apps would just sleep() for a bit if their malloc() failed. Sometimes I really wish the OS would just tell me it's out of memory. I really don't want it killing /bin/bash to make room for a temporary buffer of another app.

Correction and rebate, posted 22 Mar 2000 at 00:31 UTC by chbm » (Journeyer)

From man 2 open:

ENOTDIR A component used as a directory in pathname is not, in fact, a directory, or O_DIRECTORY was specified and pathname was not a directory.
There is a great deal of possible errors. The fact that libcs don't implement them correctly don't invalidate the API.
Anyway, not all applications want detailed errors (in fact, i believe most don't) and that is rational enough for not rolling all kinds of fancy checks into the libc. If you want to do it, fine, your app can do it on it's own. If you want to write a wrapper lib to do it, go ahead. But lease don't imagine bloating *all* applications for the benefit of a given number of apps.

About Java exceptions, posted 23 Mar 2000 at 03:03 UTC by mbp » (Master)

I don't understand why graydon says that ``java checks exceptions at compile time but does not provide any programmer control over resource release, so there's no way to be sure that leaving a stackframe via an exception will clean up all the things aquired in that frame.'' Java is not perfect, but I think exceptions are generally pretty well designed and do not have this shortfall at all.

The most obvious and common resource acquired by a program is the creation of new objects. Since Java requires garbage collection these objects will always be cleaned up properly: this code will not leak:

public void foo(boolean damage) { Object a = new int[10]; if (damage) throw new RuntimeException(); }

(Advogato lack of <pre> tags sucks)

The second most common and important resource acquired is synchronization monitors. Again, because these are always tied to lexical scope they will be cleaned up without requiring any programmer intervention. This method will also never return without releasing the monitor:

public void foo(boolean damage) { synchronized (this) { this.a++; if (damage) throw new RuntimeException(); } }

Well over 90% of Java classes will clean up in a completely satisfactory way in the presence of exceptions using these techniques. In general code can respond to exceptions it didn't expect in a reasonable way.

Explicit programmer management of resource cleanup is sometimes required, usually because the resources in question are not directly controlled by the JVM. Consider for example wanting to make sure that a database transaction is rolled back immediately in the case of an error:

public void foo(Database db, boolean damage) { Transaction tx = null; try { tx = db.newTransaction(); tx.doSomething(); tx.doOtherThing(); tx.commit(); tx = null; } finally { if (tx != null) tx.rollback(); } }

The finally clause is always called when exiting the block whether normally or abnormally. (There is a formal and clear explanation of this statement in the Java Language Specification.)

Another great feature of Java exceptions is that because they're just objects we can for example define chained exceptions that contain information about underlying causes. For example "couldn't save record" because "database transaction failed" because "IO error on hda1". We can also write for example a general-purpose logger or error dialog that interrogates the exception object for information.

One problem with this, of course, is that creating an Exception object every time an operation fails is pretty expensive, whereas the Unix kernel can just return an integer value which is much cheaper.

Creating all these objects can cause performance problems in Java programs even on modern hardware, so I imagine it was completely infeasible on original Unix systems. People might be hesitant to put any additional cost into the kernel where it has to be paid by every single program. Remember that in many cases error codes are harmless and will not be reported to the user, and so it would be a waste of time to generate detailed messages: look at how many times -ENOENT is returned while libc starts up.

I'd be interested to see the code for your error-reporting library, and I think it could be a very good thing. It seems nearly impossible to implement your check for "/bin/sh/boy.c" in userspace without introducing race conditions: it's no good to go back and check one component at a time if the kernel fails the call, because the situation may have changed in the interim. Perhaps we could augment errno with a more detailed explanation that was filled out in the kernel at the moment the error is detected.

Microsoft has experimented with several different exception-handling schemes in the Windows API. Perhaps other people can comment: last time I looked, three incompatible systems were used in different parts of the Win32/AFC code.

Java's lack of resource release upon stack-frame exit, posted 23 Mar 2000 at 13:27 UTC by sneakums » (Journeyer)

I believe that graydon is referring to the fact that since a Java class does not have a destructor as such, and that in Java there no such thing as a stack-allocated object (all class instances (objects) in Java are allocated in the free store and are subject to garbage collection), the programmer cannot define classes whose instances release resources upon destruction, one can in C++.

As far as I recall, there is in Java a finalize method that is run when the object is deallocated, but since this is run at some indeterminate point in the future, it is not equivalent to a C++ destructor, since we always "know" when a C++ destructor is run. Not only that: we rely on it.

The use of stack-allocated objects in C++ to acquire and release resources is a very useful idiom. It interacts very well with exception handling and C++ exception handling would be far less useful without it.

Finalizers in Java, posted 23 Mar 2000 at 19:10 UTC by mbp » (Master)

You're correct that Java finalizers are not run at a strictly defined time as C++ destructors are, but in practice this is not usually a problem. It just requires a slightly different idiom to what one is used to in C++, and in any case graydon said `any control', not `control through object destruction'.

To my mind

public void foo() { OutputStream os = new FileOutputStream("/tmp/a"); try { os.write(arry); } finally { os.close(); } }

is sufficiently straightforward. Not allowing objects on the stack trades off flexibility for simplicity, but it certainly doesn't disallow deterministic cleanup.

finalizers and finally, posted 23 Mar 2000 at 22:38 UTC by graydon » (Master)

"finally" isn't quite right though. all it guarantees is that the finally block is "run" on exit -- it does not guarantee that the finally block will complete. suppose I allocate 10 sensitive objects which need to be finalized on exit. my finalizer block can do something like

finally {
frob.finalize();
snerk.finalize();
tweedle.finalize();
...
}


but if each of those finalizers might throw a different exception, I need to nest the handlers and finalizers:

finally {
try { frob.finalize();
} finally {
try {
snerk.finalize();
} finally {
try {
tweedle.finalize();
}
...
}}}}}

and honestly, if something is this awkward idiomatically, it's being done wrong. if you have a program with a lot of different failure modes, it can kill program comprehension to have a lot of noise like this to handle things.

furthermore, since the VM itself will (might) call finalize() on the objects when it GC's them, you realy need to set a "clean" flag inside the object which the finalizer checks to ensure it doesn't run twice (and thus, in some cases, hurt the underlying system even worse).

compare this approach with stack objects (or even immutables, like in sather) and I think it's clear that stack objects win.

Simplicity floats, posted 23 Mar 2000 at 22:48 UTC by sneakums » (Journeyer)

Not allowing objects on the stack trades off flexibility for simplicity, but it certainly doesn't disallow deterministic cleanup.

But it does disallow deterministic automatic cleanup.

The problem with the finally { ... } idiom is that is places the responsibility for releasing the resources with the programmer. I feel that this task is better handled by the object itself; the object is in a unique position to know what it has acquired and thus to safely and completely release it.

The wonderful thing about the "resouce acquisition is initialization" idiom is that classes that use it "just work". That is where simplicity is gained.

Stack objects, posted 24 Mar 2000 at 00:43 UTC by mbp » (Master)

Sure, but in C++ having objects on the stack allows all kinds of interesting damage to do with object slicing, keeping references to dead objects, and so on. On the other hand scoping objects in this way can be very clean. This is appropriate and necessary in the no-guard-rails style of C++, but would go against the OH&S design criteria of Java. Java's pretty keen on there only being a single right way to do things.

I've had to use the nested finalizers idiom ocassionally to get correct cleanup, but my point is that in the general case is automatic and the complicated case (of holding externally controlled resources) is at least possible.

Personally I think the Java design of monitors is worse than finalizers: associating a lock with every object is inefficient; making monitors public breaks encapsulation; and making primitive monitors re-entrant is questionable.

Java exceptions, posted 24 Mar 2000 at 19:19 UTC by jwz » (Master)

My two biggest complaints with Java exceptions are how static they are, and that there is no way to register a handler for an exception that will decide to continue, rather than throwing out.

I find that as I'm writing code, I constantly have to go back and modify multiple files as I realize down the line that some routine calls some other routine that might sometimes throw some new exception. So I have to go all the way back up the potential call stack and add those exceptions to the list of all callers. This is bogus and non-objecty.

It's bogus that I can't register a handler that would do something analagous to handling floating-point underflow by returning 0. (I say ``analagous to'' because in that particular case, there are performance reasons not to use exceptions for that kind of thing, but the general class of problems still exists.)

Sneakums is right that putting code in object finalization methods is generally better (cleaner, safer) than using `finally' clauses.

I found the exception mechanisms used in Flavors and CLOS to be a lot easier to deal with than Java's.

I wish Java had some notion of stack-allocated objects, but only for performance reasons, not because of the programming idioms it allows. Generally, any time you care deeply about when an object is actually destroyed/finalized, it's because you're doing manual storage management, and that kind of misses the point of working in an GCed environment. Because the thing is, some day your assumption about when an object is really dead is going to be wrong, whereas if you just let GC do its job, you would never be wrong.

Of course the real bitch about Java is that it's impossible to define new syntactic elements. For example, Common Lisp has `open' and `close' functions for files, but you pretty much never use them, instead you use (with-open-file (fd "name" ...) ...body...) where with-open-file is a macro that does the equivalent of `try/finally' for you. Since neither Java nor C have a sensible macro mechanism like this, you push the effort off to each and every programmer (each consumer of your APIs) to manage their finallys by hand, rather than just providing them with a `with-frobbing-foo' form that scopes things properly.

It's much easier to get people to write

(with-open-file (a ...)
(with-open-file (b ...)
(with-open-file (c ...)
...body...
)))

than the equivalent

try {

a = open(...);
try {

b = open(...);
try {

c = open(...);

...body...

} finally {
close(c);
}

} finally {
close(b);
}

} finally {
close(a);
}

(My god, the HTML parsing that Advogato does is complete shit! Every time I do `preview' it doubles the number of <P> tags, and adds more newlines. Now I'm editing text where some sentences have each word on their own line!)

hmmm, posted 24 Mar 2000 at 21:50 UTC by Ankh » (Master)

The comment I threw in (incoherently) about exceptions seems to have been more controversial than anything else, interestingly. But perhaps not surprisingly.

The reason I said that exceptions make error reporting worse, apart from being a blatant effort to stir people up :-) was that they invite a kind of programming that focuses on hat worked, not on what failed.

I'd like to go back a bit; when I mentioned a task context, I was thinking not of the call frame and a thread context, but of a human user task context. You might have several function calls (going back to C) that are all in support of saving a configuration file, and that might be done because you changed your Garment Colour Preference to Purple as part of ordering a pair of socks.

In that example, if the config file save failed (out of disk space, say), I want to know whether my order failed, or if I'll be charged money for purple socks that will never arrive. If the program is running locally, or if i choose "more details", or in a logfile (erp, but we're out of space!) I want to see that te code was trying to save a conf file, that it was because my preferences had changed, and that this particular file system had filled up, and the conf file was (or was not) trashed, and the backup was (was not) restored.

All of this is possible with exceptions and having the top-level module report the error, as long as your language supports modifying the exception as it filters up, to add sub-task information.

It's also possible by passing the information </i>down</i> the stack, or mantaining a User Task Model separately from, or as part of, your Data Model (if you use M-V-C). In this case, the code generating the error message has access to all the details it might need about the exact local problem (which disk is full), so it might be easier to write better errors, but the code is more likely to be generic and shared, so it less likely to want to do so.

The knee-jerk reaction of many programmers to someone who wants to make software accessible to less technical people, or to provide enough information about problems that you don't need to be a macho boot-wearing mountaineer (OK, I'm not macho, I admit it darlings) is to say that the result will be "bloated" (scroll up, someone said it already). Someone told me the other day he wasn't interested in using XML for data files (for bind) because "XML is a bloated library". Ignorance is everywhere. Keep your laser handy.

I tried to connect to an IRC server eysterday and spelt the name wrong. BitchX said, can't connect to server xxx: No such file or directory. Good one. A traceroute and a ping and a telnet later, I eventually worked out the problem. OK, so I'm slow, and I don't wear shoes.

I'd like to see the environments I use have better error handling. Is that bad?

Oh, for those who wanted to see the code I have, I did go through an 11-year-old backup tape and found something broken, so I salvaged a tiny part of it at www.holoweb.net/~liam/elib0.01 but I'm not sure it's worth looking at. I'll see if I can find the code that tried to diagnose problems, as it's more interesting, but Clyde has my Zip drive and FreeBSD can't read SPARC SunOS SCSI disks...

I suggest a set of wrappers for open() and friends that can be in a shaerd libray and can help with errors. This can make command-line applications smaller (no need to test, print error and exit all over the place) and can help GUI applications recover more gracefully. I don't have time now (alas!) to devote to writing such a library, or even to managing it as an open source project, so I've thrown it out in case someone else does. Open Source Ideas :-)

fatalerror poorly written, posted 29 Mar 2000 at 18:38 UTC by jmg » (Master)

If you read the errno specification, it makes NO gurantees as to the value of errno when you call another library function. This is because the library function may make a syscall which will modify errno. The correct implementation of fatalerror is:

void fatalError(char *message)
{
    extern char *progname;
    int olderrno;
    olderrno = errno;
    if (progname) {
        fprintf(stderr, "%s: ");
    }
    fprintf(stderr, "fatal error: ");
    errno = olderrno;
    perror(message);
    exit(1);
}

Which will give you the results you expect. As someone pointed out, if you actually read the documentation, errnos are a perfectly acceptable error reporting mechanism.

What I'd rather complain about is all those people that assume a < -1 return value from a syscall is an error!

-1 error returns, etc., posted 6 Apr 2000 at 05:36 UTC by green » (Journeyer)

There aren't really many cases where checking for -1 and checking for errno are both necessary. There are a couple cases that come to mind:

If you lseek() and end up with a return value of -1, it could mean one of two things. it could be an error; it could certainly also be that you are now at the offset -1 in the fd you seeked upon.

If you call getpriority() to set the niceness of a process, it can return -1 for the same two reasons: the priority of the process could be -1, or there could have been an error.

There are similar issues with stro.*(), since the number could be (e.g.) LONG_MIN or LONG_MAX, this case must be handled as well as the case where LONG_MIN/LONG_MAX mean that an underflow/overflow has occurred.

Luckily, there aren't very many places where these can be problematic. For an overwhelming majority of commands, -1 is the standard error return (or NULL, MAP_FAILED, whatever is defined in the API of the called function). A good programmer should know to set errno to 0 and check errno after calls to those functions.

I do agree it can be confusing, but the only thing that will help is experience. I don't feel that errno is a terrible API, but I do of course sometimes get irked at weirdness in APIs with regard to error returns. Some APIs are just badly designed, for example, char *fgets(char *str, int size, FILE *stream); should return something useful, such as an int of the length read, rather than the incredibly obtuse NULL for failure. NULL should be for failure of something which would be returning allocated memory...

Perhaps there should be a write-up somewhere which will help acquaint programmers with these kind of quirks. I wouldn't mind contributing to a "Common Unix programming pitfalls" page, or something of the sort.

re: `exceptions' by grayon, posted 8 Apr 2000 at 19:18 UTC by kaig » (Journeyer)

One other issue which bothers me about (Java) exceptions is that they make it real hard to do information hiding right.

Suppose you store an inventory, and at first you do it using ascii files. There will be a method to open the inventory, and it is natural that this method throw FileNotFoundException when that happens. Now the project progresses and now you decide to switch to an SQL database. Of course, now the natural exception to throw is an SQLException of some kind.

There are two separate issues to think about. The first issue is the signature of the method -- the kinds of exceptions it throws are part of the signature since they are listed in the `throws' clause. Trying to deal with this problem by declaring all methods to throw Throwables doesn't deal with the second issue, though.

The second issue are the catch clauses. Before, you called openInventory() and caught FileNotFoundExceptions, now you call openInventory() and want to catch SQLExceptions.

The only way around this that I can think of is to change the classes of the exceptions, which is highly tedious and not satisfactory at all. That is, you design a class InventoryException together with some subclasses, and have the openInventory() method throw one of those. But this means that the exception changes class every time it goes up the call chain (and crosses module boundaries).

But I thought the promise of exceptions was that they just propagate up the call chain, and are dealt with at that point in there call chain where it is most appropriate! And now you find that you have to deal with every exception at (almost) every point in the call chain.

Is there a silver bullet? Maybe Java exceptions are braindead and other programming languages did it right? But which languages?

errno as a non-int, posted 10 Apr 2000 at 15:44 UTC by hpa » (Master)

Although I think errno has to be an int in the current C standard (I don't have it handy), it would be really nice if it wasn't. If errno instead was a structure pointer, it could contain a lot more information, and it would be much easier to add error codes appropriate to specific libraries, since errno values could now be defined in other places than <errno.h> and the error messages don't have to be organized in struct tables.

Symbols like ENOENT would then be macros of the form:

typedef const struct __error_info *errno_t;
extern errno_t errno;
/* ... */
extern const struct __error_info __ENOENT_struct;
#define ENOENT	(&__ENOENT_struct)

... or even ...

extern const struct __error_info __ENOENT_struct;
const errno_t ENOENT = &__ENOENT_struct;	/* No macros! */

If a library adds its own errnos, the final link will make them unique by sheer virtue of having it be a different structure at a different address. Now strerror() for example becomes simply:

char *strerror(errno_t error)
{
	/* Insert localization stuff here */
	return error->message;
}

This obviously applies to user space. A table lookup would be need to convert the indicies from the kernel into the pointers used in user space, but that's an utter no-brainer.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page