This article is a followup to the "Desktop Linux: Choices for simplicity" article: a technical description of the components necessary to being able to pull USB drives out with impunity. The components are awkward, to say the least, and integrated across-the-board, and are necessary due to a bug in the 2.6 kernel with "umount -l" not working properly.
The goal is to make it possible for ordinary users to rip USB cables out, remove USB media cards, press the button that ejects the CD and generally do what THEY want not what YOU, the software writer, dictates that they SHOULD do.
User convenience is the priority - not technical superiority of speed or performance. With that in mind, I feel quite comfortable in justifying, rather horrifyingly, the installation of the "fuse" example program (fusexmp) on a production system, as part of the components.
... but why??? why install a userspace filesystem program, and why is it necessary? I'll answer that in a minute, after first describing the components:
fusexmp - File system in userspace "example" program.
HAL - Hardware Abstraction Layer (modified), with a modified version of fstab-sync which writes AutoFS config files
AutoFS - with a modification in the kernel header file reducing the "negative timeout" from 60 to 4 seconds (very important!)
KVM - KDE Volume manager (from kdenonbeta cvs), a program that reacts to D-Bus commands from HAL, and responds to changes in media.
SE/Linux 220.127.116.11 kernel although the SE/Linux bit is irrelevant here, it meant some extensions to fuse to get it to work: it's mentioned here for completeness.
The implications of this set of... awfulness - a stack of non-standard modifications across the board - is startling to contemplate: WHY, WHY do this???
well, it began with a quite serious kernel bug, the symptoms of which is that by ripping out a USB drive or a USB card without doing unmount, that USB drive or USB card can never be seen again unless you shut down all applications, insert it and remove it twice, and even then you're not guaranteed to see it, and may have to do a shutdown of your machine.
... and yes, i do have to emphasise, this IS linux we're talking about.
the point is that umount -l of the USB card by the HAL daemon, just after it gets a hotplug event that detects that the card has gone, doesn't work as expected: an ioctl on the scsi device which tells it to re-read the partition table reacts with "Device or resource busy".
Until you close all applications that used to be using that drive, and the directory handles are all closed, the scsi device cannot be released, hotplug events are not generated, and HAL cannot respond to non-existent hotplug events to remove the kernel module.
using umount -lf actually makes the problem worse, not better.
So, rather than fix _that_ bug, I introduced the possibility of a whole boat-load more.
So, why fuse? Well, fusexmp provides a proxy view of your filesystem - the entire filesystem. It's the equivalent of running NFS onto your own filesystem over 127.0.0.1, of running samba and smbfs - you get the picture.
Importantly, fusexmp provides a number of key benefits:
stateless file access (no file or directory handles left open)
independent inode numbering irrespective of the underlying fs
it doesn't give a monkey's about what it's proxying
The first of these is crucial to being able to pull USB drives and other media out without warning or unmount. For a file open, a stat is performed. For a file read, an open, a pread, and a close are performed. for a directory open, an opendir, a readdir of the entire directory is performed, and a closedir are performed. For a directory read, the cached results at open time are given. So it goes on: at no time are any handles kept open for significant periods of time.
The second is crucial for being able to reinsert media: namely that fusexmp receives the full path name of anything it is opening or accessing, creates its own internal inode numbering, and that is what KDE's "directory notification" and presumably FAM as well, rely on.
Some people would view being able to remove media and insert it again as a distinct disadvantage (I don't) in particular where different media could be inserted, and a file save operation performed by a running application on a completely different disk: that's entirely up to the user to deal with that. A technical solution is to have the mount point named after the volume serial number or volume label name in the case of a DVD - this _can_ be done by placing an appropriate label-aware fstab program in /etc/hal/devices.d/.
Anyway - it works, that's the main thing. There are a few things that could be done better, and there are things that might not need to be done (using autofs for example).
What could be done better? Well, implementing fusexmp as a kernel module, for a start - as a kernel module called proxyfs. I've made a start on this, but the fact that fuse implements an inode cache in userspace (!) has me a bit stumped: I am presently examining smbfs and the kernel module it is based on (ncpfs) as an alternative "starting" point. The important thing about ncpfs and smbfs is that the filesystems they access don't support inodes, so both these kernel modules need to "invent" inode numbers, in exactly the same way that fuse does [but in userspace].
Implementing proxyfs is quite straightforward: all it consists of doing is remapping the VFS calls to sys_XXXX calls! for example vfs_proxyfs_rename() consists of calling sys_rename, but first allocating some userspace memory, determining the full path name (using d_path), prepending the proxy mount point to that path name, copying that path name (which will have been created in kernel memory) into the userspace memory and then calling sys_rename. The messy bit is ensuring that the dcache entries for inodes remain up-to-date with unique inode numbers (which is where cut/paste of code from ncpfs comes in handy).
Even if the bug in 2.6 is fixed, I still don't believe that using fusexmp (or proxyfs) will be superseded: if you remove a filesystem out from under a user program such as konqueror, by using umount -l, how do you get it back???
Only by following the age-old computing adage "Got a problem? Add another layer of indirection" can the required decoupling be achieved, and i doubt very much whether _any_ linux kernel, let alone 2.6, is ever going to have "another layer of indirection" integrated seamlessly behind the scenes.
If anyone has ever successfully achieved the same goals as above (with NFS over localhost, with samba plus smbfs or cifsfs, or other) I would love to hear about it.
the differences are subtle and interesting:
- supermount mounts and takes care of access to a "sub"-filesystem, such that you can simply prepend "none supermount ..." and a few options to your fstab entries and expect it to work without changing anything else (esp. where your home directory is!)
- fusexmp gives a "second" view onto an existing filesystem, starting from "/" including, rather dangerously, itself [but not in the modified version i'm using, which proxies the user's home directory to /Documents].
fusexmp therefore potentially accesses multiple mountpoints whereas supermount manages only one [per supermount mount point, if that makes any sense].
- supermount matches every VFS call with an access to the inode functions of the underlying "sub"-filesystem it is managing... but first it does a check to see if the "sub"-filesystem is still mounted.
- fusexmp's VFS read function, by contrast, does an open, read and close: so is write, and so is readdir.
the difference is significant: i'm not yet certain as to how supermount expires (validates) its inodes properly, whereas fusexmp doesn't have to - if appropriate it does a getattr (a stat), and revalidates the inode as appropriate.
- supermount relies on the inodes of the underlying "sub" filesystem.
- fusexmp relies on the pathnames.
the difference here is that with fuse, you can rip out one disk and replace it with another (with a different directory structure) and then rip it out again, and put the original one back... and a file save will work!
i really couldn't tell you if the same thing would work under supermount!