Making software path-relocatable

Posted 14 Aug 2000 at 20:51 UTC by matt Share This

How many pieces of software on your system, if you woke up tomorrow and said "hmm, I think these packages should live in /foo, not /bar", would need to be recompiled from source? What can we do about it?

It was actually fairly reasonable a few years ago to have software with compiled-in paths. The packages on my Linux-based system beyond what was considered "the OS itself" numbered below ten, and I really had no pressing need to move anything anywhere else. Said packages were also relatively tiny and could be taken care of with a shell script run overnight. No problem.

Today there are hundreds of packages in a typical Linux-based system (in fact, most are made up entirely of packages) as well as those installed as ports on the *BSDs. Most come in non-path-relocatable binary form. The whole kit and caboodle would probably take a day or three to recompile on my hardware, and I'd have to worry about those packages that wanted paths to be passed to them in ways other than --prefix; if I wanted to adopt an /opt/packagename-type scheme, the whole thing would be a long, involved process.

When I realized this, I thought to myself, "gee, someone must have thought about this already, isn't there a solution out there?"... but I haven't found any current projects in this vein; and even if there were one, it sure isn't being used anywhere.

I've been mulling it over in my mind and I thought perhaps a very lightweight shared library could be dlopen'd by a software package; said library could then be queried for paths where that package could find its support files as well as the support files and libraries of other packages. If dlopen failed, the package could assume that the system it was running on didn't have this support, and fall back instead to compiled-in paths.

On the backend, the system administrator could define with a short configuration file that he wanted everything in "/local/packagename/bin", "/local/packagename/etc", etc. If a package was installed somewhere else, it could use the API defined by this library to register an exception to the rules (as root, it could register a system-wide exception; as a user, it could register a user-only exception, possibly it could do this as well as root with a flag).

Obviously there are other issues, such as users' PATHs (actually, this could be handled nicely by the shell asking the library for a list of the binary paths of all packages). Making this library would probably be the easy part. Patching the large library of free software we currently have as well as convincing everyone to use the library in new and existing software would be a large undertaking.

Opinions?


Symbolic links, depot, etc., posted 14 Aug 2000 at 21:20 UTC by egnor » (Journeyer)

Almost any experienced large-scale Unix system administrator has encountered this problem and solved it with symbolic links. Basically, every package is built to "think" it lives in a standard path like /pkg/packagename/version (e.g. /pkg/emacs/19.20). When you decide that you really want to move emacs to /foo (for disk space or some other reason), you just create a symlink from /pkg/emacs to /foo/emacs or whatever. To deal with PATH, there are symlinks from /usr/local/bin (or some other place) to everything in /pkg/*/*/bin. (Similar link forests work for man pages, libraries, header files, and other "central repository" issues.)

This can be done totally by hand, but there are a variety of software packages out there to handle aspects of this configuration style (installing programs, maintaining the "link trees", etc). All in all, it works quite well.

Personally, I think paths shouldn't be compiled in -- they should be built into the source. --prefix is just a pain. There should be a single, agreed-upon, very unique path for every version of every software package, and it should go there, period. Obviously this shouldn't be linked to physical storage, and it's often desirable to have multiple different builds of the same package version around, and for users to be able to build things in their home directory -- but I'd much rather see this solved through an elegant general solution like Plan 9's local namespaces.

Above all, I don't want yet another naming/directory/location service to think about when I'm configuring my machine. System administrators already LDAP and NIS and DNS and nsswitch.conf and PAM and a zillion little package-local tables of what goes where and how it works and who's allowed to use it -- please, please don't add another one to keep track of (and build into maintenance scripts and audit for security and figure out how to deal with when you don't have root and fix when it breaks and...).

Use logical pathnames ala Common Lisp, posted 14 Aug 2000 at 21:29 UTC by craigbro » (Journeyer)

Common Lisp solved this problem by introducing logical pathnames. To summarize, a logical pathname constists of a host, a device and a path. A logical pathname is translated into a regular path according to a set of translations, which can be changed at runtime.

A host doesn't mean a seperate machine, it's just a toplevel identifier. Often the host is the name of your application, or a module it uses. A device is also not a device in the common unix sense, a peice of hardware. It's often used as something like a top-level directory like etc or bin. Then the rest of the path is a unique name for a file on that device.

A concrete example would be wco:config;wco-config.lisp This means the file wco-config.lisp on the config device for the wco host. To tell lisp where to find this file we set up a translation for this logical pathname:

(setf (logical-pathname-translations "wco")
    ("config;*.*.*"   "/etc/wco/")
    ("src;*.*.*"         "/usr/local/src/wco/")
    ("data;*.*.*"      "/var/lib/wco/")
    ("**;*.*.*"           "/usr/lib/wco/**/"))

This sets up a set of translations that would work like this:

wco:config;foo.lisp                     =>   /usr/etc/foo.lisp
wco:src;bar/baz.lisp                    =>
/usr/local/src/wco/bar/baz.lisp
wco:data;baz/foo/bar.lisp         =>  /var/lib/wco/baz/foo/var.lisp
wco:ANYDEVICE;somepath.lisp  =>  /usr/lib/wco/ANYDEVICE/somepath.lisp

Since you can redefine the pathname translations at runtime, not only can you change the paths the application uses without recompiling, but you can do it in the middle of running the application (obviously open files and such would have to be re-opened).

Relocatable binaries, posted 14 Aug 2000 at 23:14 UTC by djm » (Master)

Scripts are fairly easy to relocate (sed), binaries present more of a challenge.

I think the binary problem could be fairly easily solved using ELF objects. Define a 'standard' format for paths etc and stuff them into an ELF section. Wrap access to the data up in a neat library so developers need never see the gory details.

dpkg, rpm or commandline tools could modify the ELF section at install time.

It would make digitally signing packages more difficult and obviously wouldn't work on coff executables.

Not quite as easy as it sounds, posted 15 Aug 2000 at 00:15 UTC by Adrian » (Master)

symlinks can help a lot, but there are cases where compiled in paths just break if you move things. Especially if relative paths are used.

It's not that uncommon to find apps that hardcode their config directories, or even their own location.

One of the more bizarre tricks I've seen to solve this problem was to compile the apps with a fake install path that was MAXPATH long. And then, when the apps were relocated, a binary search&replace would replace the fake path with the real path and pad out the rest. Evil.

I've played the symlink game too, posted 15 Aug 2000 at 03:01 UTC by matt » (Journeyer)

I've played the symlink game myself; in fact I used to advocate such a scheme as an alternative to the /usr/local madness that could be found in *BSD ports. Today it's not as big of a deal with things like OpenBSD's faking support, which makes sure all ports install and uninstall cleanly, no matter what; but when I try to nicely package up things at work for our Solaris boxes I end up back in that old game again. In fact, we have two boxes -- each set up by a different administrator, each with its own naming scheme. Even with symlinks I still have to set up the package twice, although I don't have to compile it twice (whew!)

By contrast, if a generically compiled-up package could query the system as to what it was supposed to do, then the whole world could use one package regardless of local policy. As for it being a hassle... well, I obviously can't guarantee it won't, but it seems to me that it (my theoretical system) could be installed with sensible defaults -- or not installed at all -- and not cause headaches for the administrator.

Solved on Macintosh, posted 15 Aug 2000 at 03:04 UTC by ftobin » (Journeyer)

While I am not a Macintosh user, this problem has supposedly been solved for a long time on Macs. When a Macintosh program wants to run Netscape, for example, the first time it tries to run Netscape, the program looks up a file-identifier; this identifier references the file, even if the file has its path changed. The program no longer needs to know where Netscape is; it simply references it via this identifier.

This sort of situation allows Macs to not need things like hard-coded /etc and /bin; users can rename directories at will and everything still works.

Logical pathnames, posted 15 Aug 2000 at 03:14 UTC by matt » (Journeyer)

Logical pathnames sound very similar to what I was thinking of for the configuration component, with the following differences:

  • Different naming components (I was thinking along the lines of author/vendor, package name, version; though others may well be added)
  • Not continually evaluated during runtime, although of course if the program author was into heavy pain he could implement this

The key is that to discover a path, a package would have to supply certain pieces of info. This would enable the /opt/packagename/whatever scheme if the admin so chose. It could also (at compile-time, run-time, or both) find dependencies by asking the library where they might be hiding out.

Standard package pathnames, posted 15 Aug 2000 at 03:29 UTC by matt » (Journeyer)

The standard package pathnames idea has some merit too, but it is going to require a bit of symlink spaghetti. It does have the advantage of running on systems without shared libraries.

The ELF solution, posted 15 Aug 2000 at 10:53 UTC by matt » (Journeyer)

Using ELF was another idea I'd thought about too. Here's why I decided that it might not be such a grand idea:

  • The checksum issue (as alluded to in the earlier post); tripwires would go off all over the place. However, it need not be a problem for digitally signing packages, as they can ship "pathless". You just can't redist them once you've changed their parameters.
  • Probably fewer systems right now support ELF than support shared libs, and in a pinch, shared libs can be linked-in static if the system doesn't support shared (obviously not an ideal situation, but workable).
  • If ELF is the basis for your system, you could not define your path policy up-front to be queried by new packages. It might be useful for registering exceptions to local policy, although all binaries in the package would need to know where all other binaries were (and what if the package had no binaries, only libraries?)
It's a pretty cool idea, though. Very clean if it suits your needs fully.

Installing software is relocating, too., posted 15 Aug 2000 at 22:12 UTC by gord » (Master)

I've been thinking about these problems for some time now. I've used Depot, GNU Stow, and come up with one idea for solving it via a prefix search environment variable (PPATH). Now I'm convinced that the root of the problem is in our idea of what it means to install software.

When I was working on Libtool, we had to solve the problem of being able to run a program after you compiled it, but before you installed the shared libraries it depended on. The only reasonably sane solution seemed to be to create tiny wrapper scripts that correctly set up some environment variables for the dynamic linker, and even that didn't work for all systems (such as HP-UX and AIX).

I think the ideal system would be one in which the structure of the ``source'' package is identical to the structure of the ``installed'' package. Then, there should be some standard way of passing a handle to programs that tell them where to find themselves (and thus their data and friends) on the filesystem.

Installing would then be no more than setting up some links (symbolic or hard) to the packages that the new package depends upon, and setting some links from the /bin or whatever directory in order to export the services of the new package.

This turns programs into a kind of closure, where they carry implicit data (in this case, links to files they need).

Anyway, I've glossed over many details of this system, but I'm at work implementing it as a part of Figure. Please feel free to contact me if you're interested.

AmigaOS assigns, posted 16 Aug 2000 at 23:59 UTC by matt » (Journeyer)

David Golden <goldens at iol dot de> wrote to me with the following and asked me to add it to this discussion:

What you're looking for sounds very like AmigaOS Assigns, a feature I sorely miss.

These were system-wide toplevel namespaces or logical volumes provided by AmigaDOS (which was originally based on Cambridge TriPOS). They acted somewhat like symbolic links to directories from a virtual root (when you installed the gnu system emulator ixemul.library, it represented them as branches off its emulated / directory.). Note that AmigaOS had symbolic and hard links as well, but they weren't used very often (mainly since they didn't work very well until OS 3.1 ish..)

Every application used to have it's own assign - e.g. Personal Paint, a popular Amiga art package, lived in PPAINT: This encapsulated the application and it's data, and it didn't matter to it where it really was in the file-system.

The SYS:Utilities/installer program (actually a specialised lisp interpreter) placed the correct definition into S:User-Startup (Amiga equivalent of initscripts) as required.

However, this mechanism was generalised. By installing the proper device drivers in DEVS:DOSDrivers, you could, for example open a a new xterm-like shell using CON:, a logical volume that represented a window, with TCP: you could copy to raw sockets, LHA: let you cd into lha archives, FTP: let you cd into ftp sites, etc, etc., kinda like a tidy, orthogonal-looking amalgamation of lvm, vold, userfs, devfs, procfs, and union and loopback mounts on linux. (Phew! It really seems to me that linux could really benefit from the addition of this extra layer of abstraction...)

This extra layer of abstraction was a godsend when dealing with removable media, too. DF0: represented the physical floppy drive, but when a volume was inserted into a drive, a logical volume name equal to the volume's label was also created. This let applications ask for a file "Myapp_Disk3:Foo.dat" and the OS, not the application, would query the user for a volume of that name. To the application, it didn't matter if "Myapp_Disk3:" was a floppy disk, a cdrom, a subdirectory of a hard drive, an ftp site, or whatever. The OS would sort it out.

The AmigaOS had a load of other neat features too, far too many for me to list here. Linux still lags behind it in certain respects. (in others, it's light-years ahead - AmigaOS has no true memory protection, for example). BeOS is the closest modern OS to it. Since hhh:path/to/file is already used in UNIX for the hostname, perhaps AAA::hhh:path/to/file would be a good choice for a hypothetical linux extension syntax. This also has the advantage of being reminiscent of C++/Perl's Namespace syntax... Although, of course, it clashes with IPv6... Sigh... Maybe the % character? I dunno.

For more Amiga information, check out

http://www.amiga.com - The Amiga trademark owners, producers of a language-independent virtual machine, produced in conjunction with www.tao-group.com Note that this has little to do with the "classic" amiga which I'm talking about, other than the name. It's still pretty cool. Their website does include information on the "classic" amiga, too.

http://www.amiga.org - Amiga news and information.

http://www.aros.org - an open-source AmigaOS clone, ported to x86 hardware. Quite cool (they've already ported doom and quake :-). A merge of the best bits of this and linux/BSD would be a wonderful OS.

He also included the manual page from "ASSIGN". It's kinda long so if anyone's interested please mail me, I'll keep it around for at least a little bit.

ELF section - good idea, posted 17 Aug 2000 at 16:47 UTC by jmason » (Master)

I like the "ELF section" solution best.

Sure, it'll break signing and hashes; but binary files' signatures only matter to me when I'm downloading files to install them -- ie. I'll be changing the paths at that point.

Hashes, also, would be used to guarantee file integrity (tripwire) or notice changes that require re-synchronisation (rsync); both of these should notice a change anyway if the binary moves.

Regarding the symlink trick -- Sun used to have a paper by Caspar Dik on setting up the automounter to do this. I thoroughly confused the engineering staff of one company I adminned for, by setting up one of these ;)

For small scale installations, it's not wise and not a good way to solve the relocation problem. Esp. if you have to resolve a symlink (or 2) over NFS to get to some binary -- that'll impose a performance hit.

So who's going to write the ELF section relocator? ;)

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page