Older blog entries for robertc (starting at number 30)

Cute hack of the day: checkout arch/bazaar projects via ls and cp.

You need fuse installed, bazaar, and pybaz.

for fuse, if you are using debian or ubuntu, install fuse-source, and this patch for recent kernels fuse.diff. You need to build the module for your current kernel. I did this by installing the kernel source, copying the config for it to .config, then running: "make-kpkg --revision 2.6.11-0.2 --append-to-version "-1-686" --rootcmd fakeroot modules_image". Hopefully fuse will be in stock kernels soon :).

For bazaar visit bazaar.canonical.com - binaries for debian/ubuntu & redhat.

For pybaz, see ddaa.net

bazaar-fuser itself it in arch at robert.collins@canonical.com--general/bazaar-fuser--0, robert.collins@canonical.com--general is at robert.collins@canonical.com--general.

Its at very early days now, just a couple of hours fooling around, but you can checkout any branch your usercode knows about by mounting in (say) ~/arch/ then doing 'ls ~/arch/archives/archivename/branchname/latest/'

Cute eh :)

Had a great Sunday, it was a friends birthday earlier in the week, so a few of us got together (10) and had beer, yum cha, then more beer.

One of the funniest things was running into a guy who is basically a down-under clone of Daniel Silverstone - same sense of humour, similar turn of phrase, etc.

3 Jan 2005 (updated 3 Jan 2005 at 03:20 UTC) »

IRC and utf-8 with irssi screen and xterm.

Its doable - heres a checklist.

first of all ensure you have a utf-8 locale: $LANG should be something like en_AU.UTF-8 (on debian or derived systems you can do this with dpkg-reconfigure locales). This should be done on every system involved. I.e. if you run screen, then ssh to another system, then screen, then irssi, you'll need to do this twice. You'll also need to login again to have LANG set by the system - you can just export it if you like.

now run uxterm, not xterm to start your terminal. (If you use a more primitive terminal, you may need to use a gui menu, or some other means to get it going)

Thats the terminal. Now, for screen (repeat for however many screens are in your chain) - in your ~/.screenrc (if you don't have one, copy /etc/screenrc) put

defutf8 on
. To convert existing screens without restarting...,
Ctrl-A : defutf8 on
, and
Ctrl-A : utf8 on on
(yes on is repeated) should do the trick.

Ok, now for irssi. Just run it and it should work. If it doesn't, check /set term_type - that should be utf8 (or at least, thats what mine is and it works :).

Thanks to the patience of #debian-devel, who suffered many repeated ‽ to get this going..

Well the squid sprint is over.

We've covered another heap of ground...

squid stategic directions

introduction of a formal RequestManager, and a rough sketch at the api for using it.

more bug fixes...

Persian for lunch, and steak at a pub for dinner.

The second days of the squid sprint is over.

We've closed about 6 bugs in bugzilla by fixing them, triaged about 50 more bugs, and merged in the long pending Disk IO modularisation - so the UFS disk layout can run with AIO/Threads/DiskD/Blocking... and COSS can now use Threads/Diskd or Blocking which it couldn't before..

We also discussed some long term goals for squid, which will be taken into consideration in all design and planning. They are:

HTTP/1.1

More 'extension by module' facilities - i.e. the oft requested overnight re-verification of objects should be doable by a module.

The cache should have granular control - i.e. to deactive a cache dir when a disk has failed - this implies some sort of control channel.

Connection pinning - essentially becoming a TCP Passthrough - and support for SPNEGO.

A comprehensive test suite.

Dinner was at a nice italian place, I had Biscetti & chips :).

The squid sprint has started!

Henrik, Guido, Francesco and I all wandered around Stockholm yesterday, seeing the old city, the change of guards at the palace, and doing the tourist thing.

While doing that we found time to talk about how the Store interface in squid should look, so we can do all the different things we each want to do, in a sane way. Also in progress is an interface definition for the request manager, and store objects, again to increse flexability.

So, the Canonical/Ubuntu Mataró rocked. I...

made huge improvements to Bazaar

Had my AM report finished by Pasc!

Migrated up to HoaryHedgehog

And am generally so wasted from the intense pace, I'm pretty weird company right now.

Next ... comes a 4 day squid sprint : ROCK.

14 Nov 2004 (updated 14 Nov 2004 at 22:26 UTC) »

I finished my last blog entry with 'In my next blog entry, look for dead chickens, premature optimisation, and awkward calling styles.' Well its about time I dug into that. As usual, this isn't a reviewed paper or anything formal, so please do give me any feedback - good or bad. Jdub gave me a things he'd like to see in this, so I think he should share some of the blame :). One of the specific things he asked for was ABI and API compatible migration tips for the points I make here.

I've attempted here to document the things that were confusing, annoying, problematic or otherwise as I learnt the VFS api.

<h2>A Dead Chicken</h2>

A dead chicken (not waving a dead chicken which is futile activity) is a repeated piece of program code that one has to insert to make a program work, particularly one that the system should know to do for you.Not quite the same, but not unrelated is Rusty's interface level number 1 - the compiler won't let you get it wrong. The use of a dead chicken is a guarantee you can never reach that level of interface ease of use.

The most obvious dead chicken (and perhaps the only..) in Gnome VFS (link in my last blog entry) is these three functions:

gboolean gnome_vfs_init        (void);
gboolean gnome_vfs_initialized (void);
void     gnome_vfs_shutdown    (void);
These functions are dead chickens because:
  • VFS is able to tell when init should be called. (In any entry point, it can check at extremely low cost, and the number of entry points is limited),
  • VFS is able to have shutdown called just-in-time. (It could hook into the C at_exit, or I'm told there is a similar thing in Glib/GTK).
  • VFS users do not care if VFS has been initialised or not. (If someone is extending VFS, VFS is initialised before they are called. If they are not extending VFS, they just want to be able to use it.)

Why should such a trivial thing matter? Its unneeded complexity. There are approximately 50 entry points into VFS (I haven't done a careful analysis, thus the handwaving... forgive me. I suspect the actual number of entry points that can be called cold is much lower.) If we move gnome_vfs_init() from user programs into each of those entry points, and there are only 50 user programs using VFS, then the total amount of code doesn't change. Jdub gave me a rough estimate of 1000's of gnome programs - a clear saving in overall code, in the number of things programmers have to do - and thus everything gets a little simpler.

<h4>API ABI Transition</h4>

  1. Move the body of gnome_vfs_init to gnome_vfs_private_init, and so on for initialized and shutdown. Call the private functions from the public ones.
  2. At every public entry point to VFS, call gnome_vfs_private_init. (gnome_vfs_read is not an entry point, because gnome_vfs_open must be called first).
  3. Now make gnome_vfs_init etc do nothing, just returning TRUE or nothing as appropriate.
  4. Update the documentation document that gnome_vfs_init etc should not be called any longer,
  5. At the next API break, remove the prototypes from the VFS headers.
  6. At the next ABI break, remove the symbols from your library completely, </ul>

<h2>Premature Optimisation</h2>

Premature optimisation - the root of all evil. I've since changed my mind about the bits I was going to accuse of being prematurely optimised.. rather I think they are part of the awkward calling style problem...

<h2>Callling style</h2>

From an API perspective, VFS suffers from a quite awkard calling style - by this I mean both the method signatures in some cases, and the names of methods in others....I haven't done an audit of VFS, which would be needed to make a comprehensive statement - this list is thus incomplete.

  1. Returns to pointers with inconsistent ordering
  2. Inconsistent method naming (_read vs _truncate vs _truncate_handle
  3. Returning data to pointers is ugly.
  4. No streaming interface

<h3>inconsistent ordering</h3>

Some detail is probably in order.. consider

GnomeVFSResult gnome_vfs_open               (GnomeVFSHandle **handle,
                                             const gchar *text_uri,
                                             GnomeVFSOpenMode open_mode);
and
GnomeVFSResult gnome_vfs_read               (GnomeVFSHandle *handle,
                                             gpointer buffer,
                                             GnomeVFSFileSize bytes,
                                             GnomeVFSFileSize *bytes_read);
, In the former call, the first parameter is an out-via-pointer parameter, in the second, the last parameter is an out-via-pointer parameter. Admitedly having the handle first is consistency of another form and also very important. What would be better would be to be consistent in both fashions... <h4>API ABI Transition</h4>

  1. Figure out what the new api should look like. The function names will need to change to allow API compatability.
  2. Write the new API's functions by moving code from the old ones, and have them call the new functions.
  3. At the next API break remove the prototypes from the old functions from the headers.
  4. At the next ABI break remove the symbols from the library.
<h3>inconsistent method naming</h3>

If we consider the second issue -

GnomeVFSResult	 gnome_vfs_read			(GnomeVFSHandle *handle,
						 gpointer buffer,
						 GnomeVFSFileSize bytes,
						 GnomeVFSFileSize *bytes_read);
GnomeVFSResult   gnome_vfs_truncate             (const gchar *text_uri,
                                                 GnomeVFSFileSize length);
GnomeVFSResult   gnome_vfs_truncate_handle      (GnomeVFSHandle *handle,
                                                 GnomeVFSFileSize length);
Here we have three VFS api's. The first takes a handle, and has no postfix to the method name. The second takes a string representation of a URI, and also has no postfix. The third takes a handle and has a postfix to the method name. This sort of inconsistency makes it much harder for folk to remember an API. A better way for this particular example would be
GnomeVFSResult	 gnome_vfs_read			(GnomeVFSHandle *handle,
						 gpointer buffer,
						 GnomeVFSFileSize bytes,
						 GnomeVFSFileSize *bytes_read);
GnomeVFSResult   gnome_vfs_truncate             (GnomeVFSHandle *handle,
                                                 GnomeVFSFileSize length);
GnomeVFSResult   gnome_vfs_truncate_text_uri      (const gchar *text_uri,
                                                 GnomeVFSFileSize length);
Note that there is no reason you cannot have gnome_vfs_read_text_uri - yes there is missing context, but for a 'just read a file' interface, thats what I often end up writing within my applications. Because its useful! <h4>API ABI Transition</h4>

This is probably looking somewhat familiar by now.. from here on in I'll skip the last two lines of this as being identical across the board...

  1. Figure out the function names you would like, that will be more consistent.
  2. Move all the code into new functions with those names, leaving the old ones to call the new ones.
  3. At the next API break remove the prototypes from the old functions from the headers.
  4. At the next ABI break remove the symbols from the library.
<h3>Returning data to pointers</h3>

Returning data to pointers is ugly.
What! Heresy!
Compare these two snippets of code:

        GnomeVFSHandle   *handle;
        GnomeVFSResult    result;
        result = gnome_vfs_open (&handle, uri, GNOME_VFS_OPEN_READ);
        if (result != GNOME_VFS_OK) {
           ... something based on the meaning of result ...
         }
and
         GnomeVFSHandle *handle;
         handle = gnome_vfs_open(uri, GNOME_VFS_OPEN_READ);
         if (errno) {
             GnomeVFSError error
             error=gnome_vfs_get_error(handle);
         }
The former returns the gnome vfs specific error code immediately, at the cost of making the user always have a error-managing variable around. The later hides the details of gnome_vfs_error descriptions until they are needed. This allows the common requirement of returning data to be used in normal C fashion - as the return value on the stack - rather than as an out variable, which are intrinsically less visible to someone reading the code. An additional benefit of setting errno is its a 'downlevel' indicator of errors, that reduces the work required to provide at least minimal error detection for folk working with vfs. <h4>API ABI Transition</h4>

  1. Set errno internally in all the functions whos behaviour you wish to change. Document this as an extra feature of the current API (it should be forwards compatible)
  2. In either the object or in TLS store the last result (different trade offs, I'd lean heavily to in the object, or even both), and provide a call to access that vfs error code.
  3. Implement the new api that requires querying for the error.. and make the old api call the new one to do its work.

<h3>A stream interface</h3>

A streaming interface is a separate concept from being able to read or write from a VFS Handle. A streaming object is an object that can be passed to any streaming interface and still do the right thing. It is probably possible (depending on the GObject smarts - which I don't know enough to make assertions on) to have a VFS handle pun as a stream handle - but its probably not a good thing. An advantage of a separate streaming interface is you can have multiple streams open on a single object at once. That said, it may be that VFS handles represent a single stream already... and thus are best kept that way. The primay thing though is to have stream calls as an interface that any appropriate object can implement, so that generic stream routines can be used to access all objects that deal with the outside world. I.e. disk<->pixbuffer, textbuffer<->disk, disk<->xml etc. This probably needs some prepatory work to get right - but wouldn't this be nice (note that the abort_if_error calls are intended as a replacement for exceptions, having the methods themselves abort would be a better interface, but trickier to make flexible in C):

source = gnome_vfs_open (uri, GNOME_VFS_OPEN_READ);
abort_if_error(errno);
target =  gdk_pixbuf_loader_new ();
abort_if_error(errno);
gnome_vfs_stream_copy(source, target);
abort_if_error(errno);        
pixbuf = gdk_pixbuf_loader_get_pixbuf (loader);
abort_if_error(errno);
return gtk_image_new_from_pixbuf (pixbuf);
Of note here - I'm using a consistent error detection method across all calls, <h4>API ABI Transition</h4>

  1. Define the methods required to present on all streams. (I.e. read, write and close might be all)
  2. Do whatever GObject stuff is required to make an Interface that can be implemented by such different obejcts.
  3. if GObject allows, add that interface to the existing VFS objects,
  4. ...if they don't, create a new object type that is a full GObject, and implements the stream interface, and have the current VFS objects hold a reference to the new object, so you don'tduplicate code.
  5. If a new object type was needed, provide api calls to construct these objects.

<h3>Directory interface</h3>

The VFS directory support is a little strange. The first thing that is strange is that their appears to be no async interface to directories. As I'd expect the entire core to be written async, with sync operations a veneer layered on top (as that is a lot easier than two implementations, and easier than putting async facilities around a sync core)... it stood out,

Secondly, the visit interface is somewhat confused. Typical visitor interfaces treat the thing being visited as a Composite. That is, that there is no difference between part of a tree and the whole tree. Secondly. the visitee is called back with a different method for each type of object in the tree. I.e. visit_file and visit_directory. This can be done in C as well, typically by a struct with function pointers, or, and probably simpler in the long run, by a GObject interface.

The beauty of a full visitor interface is the ease for doing custom traversals. As a strawman, suppose we define four classes for visiting - file, directory (extends file), volume (extends directory), link (extends file)

Our base visitor would do a no-op on all of the visit calls. If it returns False, visiting stop. To do a following-links visit, we could just overide the the visit-link method, and have it resolve the link and call ...visit on the target.

Thats a trivial example of course - but the point is to reduce the number of hard-coded permutations, and increase the number of dynamic variations we can trivially do. I'm running out of steam here, so I'm going to skip reading up on the GObject stuff enough to propose a specific interface.

<h3>Wrapup</h3>

So what does all of this boil down too ? Gnome has language bindings to python, perl, c++, C# and many more languages. I'll wager that most if not all of those languages are much more programmer efficient than C, for reasons discussed elsewhere. There are many lessons on program architecture in those languages, and no reason for the C users of the GNOME libriaries not to benefit. Every improvement in the design of the core will also reduce the amount of work the language bindings need to do to provide clean interfaces in their idiom. (Because a clean interface is almost infinitely adaptable, trivially)

I recently hacked up bindings to Gnome for Gnu-smalltalk. I was giving Jeff Waugh some anecdotes about why binding to Gnome VFS was harder than binding to GTK+, and he suggested I blog it...

Theres actually quite a lot of stuff there when I started to think it through, so I'm going to do two (maybe more) entries on it.

Today, I'm going to write about the style of the C headers, and their ability to be trivially machine processed, plus a little more <h3>C header file consistency</h3>

In GTK, there are idioms that made doing bindings from the installed header files very easy.

  • All the structs are defined via
    typedef struct _GnomeFoo GnomeFoo;
    struct _GnomeFoo 
    {
    ...
    };
    
    Which means that a simple Awk script can detect all the object types.
  • The C functions are all laid out consistently:
    returns  gnome_object_name_method_name (
    	parameter...
            parameter);

In Gnome VFS, neither of these are true, instead we see

  • anonymous structs:
    typedef struct {
      ...
    } GnomeVFSFoo;

    which prevents trivial analysis.
  • and broken-line function definitions:
    return gnome_vfs_type_method
                                 (parameter
    			      ...
    			      parameter);

    which means that trivial pattern matching is not enough to kick into analyse a method mode in awk - you need a full lexer & parser at this point.

The point of this is that, without the use of .defs (which AFAICT are not installed by gnome), gnu smalltalk allows a user to build & improve their language bindings with just the standard development-headers & dynamic libraries installed. The closer the automatic generation makes the bindings, the less override & correction work is needed. The GTK+ headers were fantastic, the Gnome VFS present extra challenges (that I haven't worked around yet, in point of fact). <h3>Handles and opaque types</h3>

A handle is usually an opaque identifier that one is given by a library, and can pass back to the library upon demand. It might be an index into a table, or a pointer, or even change across minor library upgrades.

IME handles usually have the following properties:

  • Can copy by value
  • the library will provide generic reference and dereference routines for handles
  • Handles are generic across the library
  • Handles are dynamically typed

GnomeVFS has a fairly typical handles concept, FWICT. (I'm not an expert here, as I've only looked at the headers, not the full source. Corrections welcomed). There are some unusual aspects....

  • Strange typedefs..
     typedef struct GnomeVFSDirectoryHandle GnomeVFSDirectoryHandle; 
    This is strange because the C function to allocate one then accepts
    GnomeVFSDirectoryHandle **handle

    So, we need to create a pointer to a ponter to a struct whose defintion we never see. Why is this strange? Well, consider:
     typedef struct GnomeVFSDirectory * GnomeVFSDirectoryHandle;
    .. GnomeVFSDirectoryHandle *handle

    Which will create the same code, but lets us put a handle on the stack without creating our own typedef. We can still hide the definition of GnomeVFSDirectory.
  • There is no library wide Handle reference-dereference methods that I could see. So each handle type is on its own, having potentially difference lifetime semantics. Consistency is good, it allows more predictable behaviour, which encourages programmer productivity (less time finding bugs because handle foo is reference counted, and bar is not.
  • Motivation? GObject is great, it allows dynamic introspection by any language on any object runing with gnome. I can, in smalltalk create an object which knows that it is a Dialog, and pass that to a python-library, which expects anything implementing Widget, interrogates my object and finds it is a Dialog.. which is fantastic. GnomeVFS does not appear to have leveraged this at all. If hiding implementation details was the goal, that is inconsistent (other GnomeVFS structs are defined - but still not using GObject), and if its not to hide implementation details, why not expose the public & private aspects the way that GTK objects do, which seems very well designed (for a C program :)).

In my next blog entry, look for dead chickens, premature optimisation, and awkward calling styles.

21 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!