It's All in the Packaging [DRAFT]

Posted 17 Sep 2000 at 09:00 UTC by ncm Share This

Following is a draft of an article I have been working on, about what it takes to get reliable software packaging. It's not really finished, but I don't know what more it needs. I'm hoping readers of Advogato can discover what is left.

                    *  -  *  -  *

People considering their first Linux or BSD installation are offered a bewildering variety of distributions, and no rational way to choose among them. Reviews are invariably superficial. Most beginners end up choosing a distribution at random -- one that is heavily promoted, or is at the local bookstore, or that an acquaintance uses. By the published evidence, you might reasonably conclude that it doesn't matter which distribution you choose -- they're all Unix, aren't they?

The published reviews' shallowness masks deep differences among distributions. These differences lead to profoundly different experiences for users.

Distributions

What makes one distribution different from another? Published reviews seem always to focus exclusively on the initial installation, despite that this may be the least important distinction. All major distributions have good installation tools now. Differences are mainly in the cosmetics (text vs. graphics), and in how much you must read before or while installing. You run the installer for half an hour, or three, and it's done.

A reviewer erases the installation soon afterward. The rest of us live with the result for months, or years. The differences in what it takes to keep everything configured, up-to-date, secure, and serving your needs (if indeed these are all ever achieved) soon overshadow any momentary (in)conveniences of getting the system installed. An unfortunate choice of distribution can be very costly, in time and in a thousand ways.

In the beginning, the job of a distribution was just to put ready-built programs on a stack of floppy disks (later, on a CD), along with instructions to help get it all installed. Naturally, fearful beginners -- and reviewers -- demanded a more-automated installation, and the installation tools included in distributions have became ever more automated and flashy.

This emphasis by beginners and reviewers on the installer, while understandable, distracts the keepers of a distribution from a more important, if less obviously urgent, job: building and installing the programs correctly. Done right, this is an enormous job. Done poorly, it results in myriad problems for users. Such problems are usually not obvious, at first, but surface over time, and can be devastating.

For example, in a recent release of Red Hat Linux (5.0?), the mail delivery program was built to assume one file-locking policy while the mail readers assumed another. The result was that any mail delivered while you read your mail was lost. This build error wasn't noticed officially for a full year. During that year many thousands of messages to thousands of people disappeared into the void, unnoticed.

[need another example.]

Building

This example reveals a neglected fact: all significant programs offer a variety of "compile-time" options for how the program is to be built, and configuration options that determine how it runs. These options include placement of configuration and data files, how much and what kind of security to use, and which non-essential features (typically, those that depend on other programs the user might or might not have, or want) are included.

Often the build defaults are reasonable for developers maintaining the program, but clearly wrong for production use. Sometimes a special version or configuration of the tools or the libraries used to build the program are needed for a reliable result. Very often, the default result of building the program is insecure, or its files end up in the wrong place. A conscientious maintainer cannot just choose from a menu of options, but often must patch the code. For some programs, such as Emacs or X, keeping up on all these details is (quite literally) a full-time job.

Popular programs are almost continually evolving, with bugs being fixed, new features added, and new bugs introduced. Somebody must keep track of the current bug list, and the apparent "stability" (the expectation of few remaining bugs) of each version, to choose wisely which version and configuration to include in the distribution. For some programs this can be another full-time job.

A general-purpose distribution bundles hundreds (or thousands!) of programs. How can they all be combined without inviting chaos?

Some distribution maintainers simply embrace chaos. This is relaxing for them, and lets their users feel macho each time they get something working. Most make some effort to keep things together, balancing a limited payroll against the tolerance of their users. Very few have established clear policies, checklists, a bug-tracking system, and schedules, and distributed the work among an army of maintainers, each dedicated to one or a few programs, so that each program is built to fit as well as possible with the other bundled programs.

Proprietary software vendors have proven that a software company can take enormous margins by marketing mainly to the most naive users, while ignoring bugs, security holes, and pervasive misdesign. Commercial distributions that market mainly to beginners may thus have little business incentive to spend more on reliability than is needed to make a good first impression on reviewers. Investments in promotion, distribution channels, and flashy installers simply pay off better. Reliability and security slip easily down the priority list.

Packaging

How programs are bundled into a distribution has changed since the earliest days. In the beginning, programs were just packed into "tar archive" files, and unpacked in place. Any configuration was done by hand-editing configuration files according to instructions in a README file.

Some distributions still work this way, but most today deliver programs in "packages". A package includes a program's files along with annotations and scripts. The scripts are used to install and configure the program as automatically as possible, and to remove it when it's not needed any more. The annotations record notes about the program and its files.

The most interesting annotations are the program version and its dependencies. The version identifies which snapshot in the lineage of a program's evolution is packaged. (Old versions have known bugs; new versions have new features and unknown bugs.) The dependencies note what other packages the program depends on, and (perhaps) which versions of those packages will do. With this information, a package manager program can determine which other packages must be obtained or updated as part of installing the package for a program you want.

It might seem that this is all you need to make everything work. That would be true in the same sense that people and goods are all you need for an economy, or musicians and instruments for an orchestra. What's missing are contract law and banks, or a score and conductor. In a distribution, what remains are sound policies and sound decision-making.

Policies

Packages interact in many more ways than simply depending on one another. The mail delivery and mail reader example mentioned above is just one example. A windowing menu system needs to know names and icon image files for the user-level programs installed, so each new package must put that information where it can be found. The on-line documentation system needs to know about the programs installed. The system startup scripts need to know how -- and in what order -- to start up services. A program that needs a generic service (such as mail delivery) needs to know how to invoke it.

The only way that maintainers of hundreds (or thousands) of packages can agree on these details is a set of clear policies, a way to decide issues not covered by the policies, and a way to evolve those policies. The best way to enforce most policies is by encoding them in scripts that implement them directly. Where that is not possible, scripts might verify that they are obeyed. Many policies can be enforced only by inspection and auditing.

Even with carefully designed policies, large issues arise. For example, when many programs depend on one library or interpreter, some may need a particular version of it. When you add a new program to your system that also depends on a particular version, it's only by good luck that the new program expects the _same_ version. (This is most frequently a problem with proprietary, binary-only programs that cannot simply be recompiled, and patched as necessary, to use the same version.)

If your package installation tool doesn't cope well with having extra versions of the "same" library or interpreter, you must override it when you install the extra version. Each time you do this, you risk introducing subtle problems. As the configuration of your system exceeds the package tools' understanding of it, you find yourself increasingly involved in decisions that nobody has considered very carefully. The result is a fragile system: each time you add or remove a package, or update the packages already on it, you risk breaking something.

If the package tools and policies do cope well with extra versions, you never need to "force" them. Curiously, most of what is needed for this is just for libraries and interpreters with incompatible versions to have different package names (e.g. perl4 vs. perl5), and for the programs that depend on them to name the version they were built for. Despite the simplicity of this solution, none of the "rpm"-based Linux distributions have adopted it yet, although they could do so any time. To fix it is more a matter of sound policy and plenty of hard work than of tricky technology.

With a sufficiently thorough package system, adding, removing, and updating correctly-built packages is safe enough that it can mostly be automated. Then, it may be reasonable simply to ask that any newly available versions of packages on a system be downloaded and updated in place, without shutting down and rebooting. Incremental updates can (almost!) be part of routine maintenance. (No distribution is entirely there yet, but most are nowhere close.)

Release Cycle

Sound policies don't only affect how packages are built. When should a newly-released version of a program become part of the distribution? New releases normally have fixes for various reported and unreported bugs, and often have new features and new, unknown bugs. All too often new bugs turn out, in time, to be worse than the old ones, and although those are usually fixed quickly, if stamped onto CDs they can cause problems for a long time.

It is often prudent to remain a version or three behind the latest release of a program, and let the most eager users discover the bugs before you commit to putting it on the CD. The version chosen will have known bugs and missing features, but that may be better than the alternative, which is not-yet-discovered and possibly damaging bugs.

The effectiveness of policies, and of processes for resolving conflicts, determine the quality of a release. If the release is tied to an inflexible schedule, or insufficiently supported by package maintainers, quality suffers. If release dates slip too much, users may be obliged to bypass the release and run pre-release or unstable development versions of packages instead, subverting the purpose of the release.

Security

It is a most unsettling experience to have your work machine broken into and trashed. Most of us have been spared, so far. The odds that this can continue depend mostly on your own behavior (e.g. avoiding telnet), but even perfect care is not enough if your distribution opens back doors you don't even know about.

Security is the classic example of a quality that cannot be tested. The only way to get it is by scrupulously following sound policies and procedures, and the only way to verify it is by careful auditing. (Reviewers rarely mention it at all, maybe because auditing is suspiciously like work.) Since good security takes hard work, and often conflicts with immediate convenience, many distributions routinely neglect sound security policies.

The most basic security policy of all is: "minimize exposure". The only machines vulnerable to exploits of any particular program are those running it. Most machines have no reason to be running most services, but many distributions install them and turn them on for no better reason than to avoid having to tell users how to start them up if they really do need them.

Of course, you never know where the next hole will be found, and we all need to run at least some services on our machines. Therefore, the next policy is "keep current". Keepers of a secure distribution need to respond quickly to discovered holes, announce fixed versions of vulnerable packages, and make it easy (and safe!) to upgrade. The last point depends, again, on sound package management.

Choosing a Distribution

How can you choose a distribution that will serve your needs well?

The most important consideration, as for any software choice, is whether you can run the programs you need. Does a distribution include packages for the programs you expect to run? Packages constructed by "third parties" may cause problems by violating (or simply ignoring) distribution policies.

Some proprietary programs are only warranted to work with certain commercial distributions, and if you want support you must run a configuration they warrant. In turn, some commercial distributions "encourage" proprietary software vendors to warrant only their one distribution. This tactic backfires when, with each new release, you are prevented from upgrading. (Software meant to run only on one configuration is dangerously fragile; are you sure it is the right software for you?)

For some machines with very new hardware, it may be that only a distribution that keeps up with the latest "pre-release" software versions will include drivers for it -- if indeed any do. This may be an issue if you have not chosen your equipment conservatively, and are not prepared to download and build drivers yourself, and you don't have anybody to help with it.

If your needs are not artificially constrained, then you can consider what your long-term experience will be. In considering a distribution, you should see if it

  • has identified reliability and security as primary goals;

  • has instituted the policies and procedures necessary to achieve those goals;

  • has a big enough library of officially packaged programs that you will rarely need to install an unofficial package;

  • has a large enough staff of maintainers to ensure that policies have been observed on all the packages they maintain;

  • comes with package management tools that make it easy to keep your machine equipped with up-to-date versions of all the programs you want to run;

  • is owned and controlled for the benefit of its users, rather than for absentee stockholders or ambitious business managers;

  • is chartered to observe and uphold industry standards, rather than subverting them to help lock you into one environment.

Among Linux distributions, only the Debian Project's "Debian GNU/Linux", and products based on it, such as Stormix, satisfy many of these criteria. [How do the BSDs fare? What should be in the list, but isn't?]

No distribution is bug-free, nor will any ever be. Furthermore, even the best policies cannot cover all cases, and areas not covered surface all the time. Nobody can guarantee you a good experience, but you can markedly improve your chances by choosing a distribution built by people for whom a good experience is their main goal, and who do everything they can to ensure it.

                    *  -  *  -  *

See also:

                    *  -  *  -  *

So, what's lacking? What BSD examples would help? Should it go into politics and license issues, or steer clear? Should it talk about specific packaging advances, such as Debian Apt, and the myriad unique packaging policies Debian has adopted (such as...) that were necessary to make Apt actually useful? What key insights are entirely missing? What URLs should be in the References section?

(Thanks to those who replied to the request for comments I posted in a diary entry some time back.)

This DRAFT is © Copyright 2000 by Nathan C. Myers. I hope to put the final version under something like the GFDL.


Debian Details, posted 17 Sep 2000 at 20:31 UTC by ncm » (Master)

These maybe don't belong in the article proper, but Jordan Hubbard asked for more details about Debian policies and procedures, and maybe others will find these useful.

The central repository for detailed documentation on Debian packaging policy is the Debian developers' pages, http://www.debian.org/devel/. I don't find a separate security policy manual; security matters are distributed wherever relevant. Of particular interest are probably

http://www.debian.org/doc/packaging-manuals/developers-reference/
http://www.debian.org/doc/debian-policy/
http://www.debian.org/doc/packaging-manuals/packaging.html/
http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=debian-policy

The "Lintian" tool tries to enforce many of the policies. The "Debconf" package implements the UI-agnostic configuration mechanism Jordan speculates about in his article.

What's the point?, posted 18 Sep 2000 at 13:52 UTC by AlanShutko » (Journeyer)

Is this just a big Debian advertisement masquerading as an critical article? If so, you might consider making a bit more mention of Debian; the reader might not take the single sentence about it on faith.

If this isn't just an advertisement, the article is missing specifics for all distributions to help the reader evaluate the different distributions and how they meet their needs. And you should be open to the possibility that Debian might not be appropriate for all users. You should try to contrast the needs different people have... for instance, I _need_ to use the most recent version of X to have proper support for my hardware. (I check cvs twice daily to see any improvements.) Your advice that I lag behind would give me a laptop without X.

Needs some more packaging tools, IMHO, posted 18 Sep 2000 at 16:01 UTC by cbbrowne » (Master)

I'll declare my bias straight out; I prefer the Debian packaging toolset, which is attempting to be reasonably complete, to the "very slightly inhibited anarchy" of RPM where, while there is a database, and there is some information concerning metadata, there isn't the toolset to manage the "bigger picture."

However, that bias shouldn't be taken too far. Debian's not perfect.

In particular, it lacks the ability to manage source packages with a sufficient degree of sophistication. In particular, what I'd like to do is something like:

apt-get install -source gnucash
and see apt-get go out and grab the various development packages required to compile gnucash .

This isn't an outrageous expectation; the BSD Ports system does do this sort of thing...

The other thing that I would see as a significant improvement to the packaging process would be to have a more "intelligent" set of tools used to install and, on ongoing basis, manage the software.

My wild-eyed vision: don't use shell scripts to control installation packages, but rather use something higher level like cfengine, which provides a language knows how to do system configuration. The basic idea would be that packages would include about 3 cfengine scripts:

  • An install script, to copy the components from the source location (say /var/cache/apt/package-extraction ) to the "real" locations where they will run from ( e.g. /usr/bin, /usr/lib, ...)
  • A deinstall script that would clean components out
  • A "maintenance" script that could perform ongoing maintenance, such as checking/fixing permissions, rotating logs, and such.

    Note: The "install" script should just copy files; the "maintenance" script should do most of the real work of configuring the package so that we may readily clean things up by rerunning the maintenance script.

With a suitable "protocol for usage," the maintenance script could get pushed into a central "repository" so that not only would it be used initially to set up the package for use, but it would be regularly run (daily/weekly/hourly as appropriate) to clean up any messes that might occur.

Oops. A script kiddie broke in and changed /usr/bin/sendmail to be suid root. Oh well, cfengine runs at 1am, running /var/lib/cfengine/packages/daily/sendmail and fixes that back up.

In effect, each package potentially provides its own "white blood cells" to provide its own immunology as well as management of any fecal matter.

Re: It's All in the Packaging [DRAFT], posted 18 Sep 2000 at 19:52 UTC by uweo » (Journeyer)

People considering their first Linux or BSD installation are offered a bewildering variety of distributions, and no rational way to choose among them.
Right.
But your article doesn't change that. A single person just cannot do an in-deep comparision of the long-time value of distributions. This needs an evaluation of different distributions over a long time, by different people who have different needs. It needs to take user, developer and administrator needs into account.

the mail delivery program was built to assume one file-locking policy while the mail readers assumed another.
This is an example of a problem which could have been avoided. Mail spool file locking is a problem known for many, many years, and the solution can't be to lock the files (which may or may not work on NFS mounted spool directories), it can only be to avoid locking. [Maildir].
It's a sad truth the linux distributors often behave like the Gates company, although i think the reason is shortsightedness, not malice. Other examples:

  1. Does installing rpms of a different rpm-based distribution work? Is there a guarantee that it either will work or fail completely, without screwing something up? Users are often enough tempted to install a .rpm, without knowing the distribution it was generated for. (and they shouldn't have to care)[RPM has a namespace problem]
  2. Are patches, even the small ones, feeded back to the authors of the software? [They often enough aren't, although i have to admit that espially the debian maintainers i dealt with did send them]
Regarding the goals:
  • reliability and security are not of necessity primary goals. The primary goal depends on the purpose of the machine. I own a machine with perhaps 20 to 40 different local and remote root security problems, and i just don't care - nobody will ever get a chance to break into that machine anyway.
    Reliability? I know people who actually don't care whether they have to boot their computers three times a day and lose an amount of work every now and then. While i can't subscribe to that view of the world: They are actually quite happy.
  • has a big enough library of officially packaged programs that you will rarely need to install an unofficial package;
    big enough is a goal which cannot be reached. It's impossible.
    The right goal may be to support unofficial packages by providing a secure framework for them and giving enough help and possibly even a bit of testing for people who create those unofficial packages.

    Unofficially packages are a necessary evil. Not only that no organization really can support all applications: Sometimes distributors may choose, for good reasons, to use a certain version of an application, while a minority of users needs some other version.

  • has a large enough staff of maintainers to ensure that policies have been observed on all the packages they maintain;
    How does the number of maintainers change the situation? If there is a ten percent chance that a maintainer does something wrong then the number of buggy packages will be 100 out of 1000, regardless the number of maintainers. What's needed is an additional step of review (that's where the hard work begins).

    The smaller an organization is the easier it is to enforce policies. On the other hand it's not very likely that one maintainer can package 20 or 50 different software packages without sacrificing something (i got the higher number from a FreeBSD porter, but i think 20 is not an uncommon number in debian space). There is no clear solution here.

  • is owned and controlled for the benefit of its users, rather than for absentee stockholders or ambitious business managers;
    Crap. Sorry. If stockholders get rich because of high sales then users are quite happy, too (why would they buy the product? The microsoft way of business will not work for smaller companies).
    Additionally: What do you expect commercial entities (customers) to think about the sentence i quoted?
In summary: You should at least rewrite the parts of the article which sound more like a debian commercial (pun intended).

cbbrowne wrote:

Oops. A script kiddie broke in and changed /usr/bin/sendmail to be suid root. Oh well, cfengine runs at 1am, running /var/lib/cfengine/packages/daily/sendmail and fixes that back up.
If cfengine is widely adopted then it will not surprise script kiddies anymore. They will make use of it (using scripts they don't understand, of course). Security means to keep them out.

Replies, posted 19 Sep 2000 at 02:19 UTC by ncm » (Master)

Thanks to those who have replied.

No, the article wasn't intended as a Debian ad. Maybe that's where I went astray. It was meant to demonstrate that packaging policy is really what determines the quality of a distribution. Debian just happens to be furthest along, that way, at the moment. That's not to say there aren't things that could be done a lot better than Debian does. More examples of things that could be done a lot better would be very helpful.

AlanShutko wrote:

...the article is missing specifics for all distributions to help the reader evaluate the different distributions and how they meet their needs...

Obviously, a full review and cross-comparison of all distributions is a big job, and would be instantly obsolete besides. My question is, what criteria should reviewers use? I demonstrated that the common practice of just comparing installers has led millions of people astray.

Your advice that I lag behind would give me a laptop without X.

The article says: "For some machines with very new hardware, it may be that only a distribution that keeps up with the latest 'pre-release' software versions will include drivers for it." Is that telling you to lag behind?

cbbrowne's suggestions are excellent, and I hope to see them all implemented in somebody's distribution.

uweo wrote:

...the mail delivery program was built to assume one file-locking policy while the mail readers assumed another.
This is an example of a problem which could have been avoided.
Yes, that's why I mentioned it in the article. All problems could be avoided, with enough care, and it's instructive when (and why) they're not.

I think most of uweo's other comments are also addressed in the article already. I wonder, though... you'd have to be pretty confused to see unreliability as a desirable feature.

  • is owned and controlled for the benefit of its users, rather than for absentee stockholders or ambitious business managers;
What do you expect commercial entities (customers) to think about [this]?

Commercial entities don't think. People think. People who work for commercial entities think about as clearly as anybody else -- i.e., generally not very. They need as much help as anybody else.

That's why we need better published evaluations of distribution quality than just the typical installer comparisons. That's why we need better distributions, too, so that even confused people end up installing reliable software.

                    *  -  *  -  *

I'm seeing that the article needs to end with a "where do we go from here", listing a lot of things that no distribution does well yet, such as cbbrowne suggests. It also needs to be clearer on who it's for. Who should it be for?

Are you *NUTS*?!?, posted 20 Sep 2000 at 01:23 UTC by Toby » (Master)

cbbrowne wrote:

With a suitable "protocol for usage," the maintenance script could get pushed into a central "repository" so that not only would it be used initially to set up the package for use, but it would be regularly run (daily/weekly/hourly as appropriate) to clean up any messes that might occur.

Oops. A script kiddie broke in and changed /usr/bin/sendmail to be suid root. Oh well, cfengine runs at 1am, running /var/lib/cfengine/packages/daily/sendmail and fixes that back up.

This is *NOT* security. Maybe I'm missing the boat here, it's been a 65+ hour working week so far. I sincerely hope that nobody implied that they hoped this would in any way improve security on any system insecure enough to fall to a script kiddie.

Quality of upgrade, posted 23 Sep 2000 at 21:43 UTC by claudio » (Master)

Just some thoughts:

One may declare that the ultimate goal of the packaging system (from the user's point of view) is to allow packages to be painlessly installed, uninstalled and upgraded, and that everything will work after the changes. (Other considerations may be relevant for packagers and distributors, such as the build process, signatures, testing, etc.)

Installing and removing files is simple, and any semi-decent packaging system deal with this task with no major hassles. Upgrading, however, is hairier: you can just replace old packages with new ones, but it's likely that you'll have a broken system after the process. Not only configuration files were not correctly upgraded, but packages depending on the proper configuration of another package being updated didn't install correctly.

As it was mentioned in the article draft, programs are continually evolving, and getting stuck with a years-old system isn't a big deal (and it may be dangerous anyway). On the other hand, having a bleeding edge system that breaks at each upgrade can be even worse. So a robust upgrading system, that offers quality of upgrade (i.e. your system will still work after the upgrade) is important for a good packaging system, like the packaging system is important for a good distribution.

Of course it requires a solid and well-thought upgrading policy to have the system working properly. In practical terms, APT + dpkg offer a good upgrading path, and it could be achieved with APT + RPM as well with some minor adjustments (a few considerations of auto-upgrading RPM packages with APT are in my diary, if you want to check it). The article draft states "the scripts [in the package] are used to install and configure the program as automatically as possible", but also in upgrading the configuration system plays a very important role.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page