10 Oct 2004 robertc   » (Master)

I finished my last blog entry with 'In my next blog entry, look for dead chickens, premature optimisation, and awkward calling styles.' Well its about time I dug into that. As usual, this isn't a reviewed paper or anything formal, so please do give me any feedback - good or bad. Jdub gave me a things he'd like to see in this, so I think he should share some of the blame :). One of the specific things he asked for was ABI and API compatible migration tips for the points I make here.

I've attempted here to document the things that were confusing, annoying, problematic or otherwise as I learnt the VFS api.

<h2>A Dead Chicken</h2>

A dead chicken (not waving a dead chicken which is futile activity) is a repeated piece of program code that one has to insert to make a program work, particularly one that the system should know to do for you.Not quite the same, but not unrelated is Rusty's interface level number 1 - the compiler won't let you get it wrong. The use of a dead chicken is a guarantee you can never reach that level of interface ease of use.

The most obvious dead chicken (and perhaps the only..) in Gnome VFS (link in my last blog entry) is these three functions:

gboolean gnome_vfs_init        (void);
gboolean gnome_vfs_initialized (void);
void     gnome_vfs_shutdown    (void);
These functions are dead chickens because:
  • VFS is able to tell when init should be called. (In any entry point, it can check at extremely low cost, and the number of entry points is limited),
  • VFS is able to have shutdown called just-in-time. (It could hook into the C at_exit, or I'm told there is a similar thing in Glib/GTK).
  • VFS users do not care if VFS has been initialised or not. (If someone is extending VFS, VFS is initialised before they are called. If they are not extending VFS, they just want to be able to use it.)

Why should such a trivial thing matter? Its unneeded complexity. There are approximately 50 entry points into VFS (I haven't done a careful analysis, thus the handwaving... forgive me. I suspect the actual number of entry points that can be called cold is much lower.) If we move gnome_vfs_init() from user programs into each of those entry points, and there are only 50 user programs using VFS, then the total amount of code doesn't change. Jdub gave me a rough estimate of 1000's of gnome programs - a clear saving in overall code, in the number of things programmers have to do - and thus everything gets a little simpler.

<h4>API ABI Transition</h4>

  1. Move the body of gnome_vfs_init to gnome_vfs_private_init, and so on for initialized and shutdown. Call the private functions from the public ones.
  2. At every public entry point to VFS, call gnome_vfs_private_init. (gnome_vfs_read is not an entry point, because gnome_vfs_open must be called first).
  3. Now make gnome_vfs_init etc do nothing, just returning TRUE or nothing as appropriate.
  4. Update the documentation document that gnome_vfs_init etc should not be called any longer,
  5. At the next API break, remove the prototypes from the VFS headers.
  6. At the next ABI break, remove the symbols from your library completely, </ul>

<h2>Premature Optimisation</h2>

Premature optimisation - the root of all evil. I've since changed my mind about the bits I was going to accuse of being prematurely optimised.. rather I think they are part of the awkward calling style problem...

<h2>Callling style</h2>

From an API perspective, VFS suffers from a quite awkard calling style - by this I mean both the method signatures in some cases, and the names of methods in others....I haven't done an audit of VFS, which would be needed to make a comprehensive statement - this list is thus incomplete.

  1. Returns to pointers with inconsistent ordering
  2. Inconsistent method naming (_read vs _truncate vs _truncate_handle
  3. Returning data to pointers is ugly.
  4. No streaming interface

<h3>inconsistent ordering</h3>

Some detail is probably in order.. consider

GnomeVFSResult gnome_vfs_open               (GnomeVFSHandle **handle,
                                             const gchar *text_uri,
                                             GnomeVFSOpenMode open_mode);
GnomeVFSResult gnome_vfs_read               (GnomeVFSHandle *handle,
                                             gpointer buffer,
                                             GnomeVFSFileSize bytes,
                                             GnomeVFSFileSize *bytes_read);
, In the former call, the first parameter is an out-via-pointer parameter, in the second, the last parameter is an out-via-pointer parameter. Admitedly having the handle first is consistency of another form and also very important. What would be better would be to be consistent in both fashions... <h4>API ABI Transition</h4>

  1. Figure out what the new api should look like. The function names will need to change to allow API compatability.
  2. Write the new API's functions by moving code from the old ones, and have them call the new functions.
  3. At the next API break remove the prototypes from the old functions from the headers.
  4. At the next ABI break remove the symbols from the library.
<h3>inconsistent method naming</h3>

If we consider the second issue -

GnomeVFSResult	 gnome_vfs_read			(GnomeVFSHandle *handle,
						 gpointer buffer,
						 GnomeVFSFileSize bytes,
						 GnomeVFSFileSize *bytes_read);
GnomeVFSResult   gnome_vfs_truncate             (const gchar *text_uri,
                                                 GnomeVFSFileSize length);
GnomeVFSResult   gnome_vfs_truncate_handle      (GnomeVFSHandle *handle,
                                                 GnomeVFSFileSize length);
Here we have three VFS api's. The first takes a handle, and has no postfix to the method name. The second takes a string representation of a URI, and also has no postfix. The third takes a handle and has a postfix to the method name. This sort of inconsistency makes it much harder for folk to remember an API. A better way for this particular example would be
GnomeVFSResult	 gnome_vfs_read			(GnomeVFSHandle *handle,
						 gpointer buffer,
						 GnomeVFSFileSize bytes,
						 GnomeVFSFileSize *bytes_read);
GnomeVFSResult   gnome_vfs_truncate             (GnomeVFSHandle *handle,
                                                 GnomeVFSFileSize length);
GnomeVFSResult   gnome_vfs_truncate_text_uri      (const gchar *text_uri,
                                                 GnomeVFSFileSize length);
Note that there is no reason you cannot have gnome_vfs_read_text_uri - yes there is missing context, but for a 'just read a file' interface, thats what I often end up writing within my applications. Because its useful! <h4>API ABI Transition</h4>

This is probably looking somewhat familiar by now.. from here on in I'll skip the last two lines of this as being identical across the board...

  1. Figure out the function names you would like, that will be more consistent.
  2. Move all the code into new functions with those names, leaving the old ones to call the new ones.
  3. At the next API break remove the prototypes from the old functions from the headers.
  4. At the next ABI break remove the symbols from the library.
<h3>Returning data to pointers</h3>

Returning data to pointers is ugly.
What! Heresy!
Compare these two snippets of code:

        GnomeVFSHandle   *handle;
        GnomeVFSResult    result;
        result = gnome_vfs_open (&handle, uri, GNOME_VFS_OPEN_READ);
        if (result != GNOME_VFS_OK) {
           ... something based on the meaning of result ...
         GnomeVFSHandle *handle;
         handle = gnome_vfs_open(uri, GNOME_VFS_OPEN_READ);
         if (errno) {
             GnomeVFSError error
The former returns the gnome vfs specific error code immediately, at the cost of making the user always have a error-managing variable around. The later hides the details of gnome_vfs_error descriptions until they are needed. This allows the common requirement of returning data to be used in normal C fashion - as the return value on the stack - rather than as an out variable, which are intrinsically less visible to someone reading the code. An additional benefit of setting errno is its a 'downlevel' indicator of errors, that reduces the work required to provide at least minimal error detection for folk working with vfs. <h4>API ABI Transition</h4>

  1. Set errno internally in all the functions whos behaviour you wish to change. Document this as an extra feature of the current API (it should be forwards compatible)
  2. In either the object or in TLS store the last result (different trade offs, I'd lean heavily to in the object, or even both), and provide a call to access that vfs error code.
  3. Implement the new api that requires querying for the error.. and make the old api call the new one to do its work.

<h3>A stream interface</h3>

A streaming interface is a separate concept from being able to read or write from a VFS Handle. A streaming object is an object that can be passed to any streaming interface and still do the right thing. It is probably possible (depending on the GObject smarts - which I don't know enough to make assertions on) to have a VFS handle pun as a stream handle - but its probably not a good thing. An advantage of a separate streaming interface is you can have multiple streams open on a single object at once. That said, it may be that VFS handles represent a single stream already... and thus are best kept that way. The primay thing though is to have stream calls as an interface that any appropriate object can implement, so that generic stream routines can be used to access all objects that deal with the outside world. I.e. disk<->pixbuffer, textbuffer<->disk, disk<->xml etc. This probably needs some prepatory work to get right - but wouldn't this be nice (note that the abort_if_error calls are intended as a replacement for exceptions, having the methods themselves abort would be a better interface, but trickier to make flexible in C):

source = gnome_vfs_open (uri, GNOME_VFS_OPEN_READ);
target =  gdk_pixbuf_loader_new ();
gnome_vfs_stream_copy(source, target);
pixbuf = gdk_pixbuf_loader_get_pixbuf (loader);
return gtk_image_new_from_pixbuf (pixbuf);
Of note here - I'm using a consistent error detection method across all calls, <h4>API ABI Transition</h4>

  1. Define the methods required to present on all streams. (I.e. read, write and close might be all)
  2. Do whatever GObject stuff is required to make an Interface that can be implemented by such different obejcts.
  3. if GObject allows, add that interface to the existing VFS objects,
  4. ...if they don't, create a new object type that is a full GObject, and implements the stream interface, and have the current VFS objects hold a reference to the new object, so you don'tduplicate code.
  5. If a new object type was needed, provide api calls to construct these objects.

<h3>Directory interface</h3>

The VFS directory support is a little strange. The first thing that is strange is that their appears to be no async interface to directories. As I'd expect the entire core to be written async, with sync operations a veneer layered on top (as that is a lot easier than two implementations, and easier than putting async facilities around a sync core)... it stood out,

Secondly, the visit interface is somewhat confused. Typical visitor interfaces treat the thing being visited as a Composite. That is, that there is no difference between part of a tree and the whole tree. Secondly. the visitee is called back with a different method for each type of object in the tree. I.e. visit_file and visit_directory. This can be done in C as well, typically by a struct with function pointers, or, and probably simpler in the long run, by a GObject interface.

The beauty of a full visitor interface is the ease for doing custom traversals. As a strawman, suppose we define four classes for visiting - file, directory (extends file), volume (extends directory), link (extends file)

Our base visitor would do a no-op on all of the visit calls. If it returns False, visiting stop. To do a following-links visit, we could just overide the the visit-link method, and have it resolve the link and call ...visit on the target.

Thats a trivial example of course - but the point is to reduce the number of hard-coded permutations, and increase the number of dynamic variations we can trivially do. I'm running out of steam here, so I'm going to skip reading up on the GObject stuff enough to propose a specific interface.


So what does all of this boil down too ? Gnome has language bindings to python, perl, c++, C# and many more languages. I'll wager that most if not all of those languages are much more programmer efficient than C, for reasons discussed elsewhere. There are many lessons on program architecture in those languages, and no reason for the C users of the GNOME libriaries not to benefit. Every improvement in the design of the core will also reduce the amount of work the language bindings need to do to provide clean interfaces in their idiom. (Because a clean interface is almost infinitely adaptable, trivially)

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!