Distributions
What makes one distribution different from another? Published reviews
seem always to focus exclusively on the initial installation, despite
that this may be the least important distinction. All major
distributions have good installation tools now. Differences are mainly
in the cosmetics (text vs. graphics), and in how much you must read
before or while installing. You run the installer for half an hour, or
three, and it's done.
A reviewer erases the installation soon afterward. The rest of us
live with the result for months, or years. The differences in what
it takes to keep everything configured, up-to-date, secure, and
serving your needs (if indeed these are all ever achieved) soon
overshadow any momentary (in)conveniences of getting the system
installed. An unfortunate choice of distribution can be very costly,
in time and in a thousand ways.
In the beginning, the job of a distribution was just to put
ready-built programs on a stack of floppy disks (later, on a CD),
along with instructions to help get it all installed. Naturally,
fearful beginners -- and reviewers -- demanded a more-automated
installation, and the installation tools included in distributions
have became ever more automated and flashy.
This emphasis by beginners and reviewers on the installer, while
understandable, distracts the keepers of a distribution from a more
important, if less obviously urgent, job: building and installing
the programs correctly. Done right, this is an enormous job. Done
poorly, it results in myriad problems for users. Such problems are
usually not obvious, at first, but surface over time, and can be
devastating.
For example, in a recent release of Red Hat Linux (5.0?), the mail
delivery program was built to assume one file-locking policy while
the mail readers assumed another. The result was that any mail
delivered while you read your mail was lost. This build error
wasn't noticed officially for a full year. During that year many
thousands of messages to thousands of people disappeared into the
void, unnoticed.
[need another example.]
Building
This example reveals a neglected fact: all significant programs
offer a variety of "compile-time" options for how the program is
to be built, and configuration options that determine how it runs.
These options include placement of configuration and data files,
how much and what kind of security to use, and which non-essential
features (typically, those that depend on other programs the user
might or might not have, or want) are included.
Often the build defaults are reasonable for developers maintaining
the program, but clearly wrong for production use. Sometimes a
special version or configuration of the tools or the libraries used
to build the program are needed for a reliable result. Very often,
the default result of building the program is insecure, or its files
end up in the wrong place. A conscientious maintainer cannot just
choose from a menu of options, but often must patch the code. For
some programs, such as Emacs or X, keeping up on all these details
is (quite literally) a full-time job.
Popular programs are almost continually evolving, with bugs being fixed,
new features added, and new bugs introduced. Somebody must keep track
of the current bug list, and the apparent "stability" (the expectation
of few remaining bugs) of each version, to choose wisely which version
and configuration to include in the distribution. For some programs
this can be another full-time job.
A general-purpose distribution bundles hundreds (or thousands!) of
programs. How can they all be combined without inviting chaos?
Some distribution maintainers simply embrace chaos. This is relaxing
for them, and lets their users feel macho each time they get something
working. Most make some effort to keep things together, balancing a
limited payroll against the tolerance of their users. Very few have
established clear policies, checklists, a bug-tracking system, and
schedules, and distributed the work among an army of maintainers, each
dedicated to one or a few programs, so that each program is built to
fit as well as possible with the other bundled programs.
Proprietary software vendors have proven that a software company can
take enormous margins
by marketing mainly to the most naive users, while ignoring bugs,
security holes, and pervasive misdesign. Commercial distributions
that market
mainly to beginners may thus have little business incentive to spend
more on reliability than is needed to make a good first impression on
reviewers. Investments in promotion, distribution channels, and flashy
installers simply pay off better. Reliability and security slip easily
down the priority list.
Packaging
How programs are bundled into a distribution has changed since the
earliest days. In the beginning, programs were just packed into "tar
archive" files, and unpacked in place. Any configuration was done by
hand-editing configuration files according to instructions in a
README file.
Some distributions still work this way, but most today deliver
programs in "packages". A package includes a program's files along
with annotations and scripts. The scripts are used to install and
configure the program as automatically as possible, and to remove
it when it's not needed any more. The annotations record notes
about the program and its files.
The most interesting annotations are the program version and its
dependencies. The version identifies which snapshot in the lineage
of a program's evolution is packaged. (Old versions have known bugs;
new versions have new features and unknown bugs.) The dependencies
note what other packages the program depends on, and (perhaps) which
versions of those packages will do. With this information, a package
manager program can determine which other packages must be obtained
or updated as part of installing the package for a program you want.
It might seem that this is all you need to make everything work. That
would be true in the same sense that people and goods are all you need
for an economy, or musicians and instruments for an orchestra. What's
missing are contract law and banks, or a score and conductor. In a
distribution, what remains are sound policies and sound decision-making.
Policies
Packages interact in many more ways than simply depending on one
another. The mail delivery and mail reader example mentioned above
is just one example. A windowing menu system needs to know names and
icon image files for the user-level programs installed, so each new
package must put that information where it can be found. The on-line
documentation system needs to know about the programs installed. The
system startup scripts need to know how -- and in what order -- to
start up services. A program that needs a generic service (such as
mail delivery) needs to know how to invoke it.
The only way that maintainers of hundreds (or thousands) of packages
can agree on these details is a set of clear policies, a way to decide
issues not covered by the policies, and a way to evolve those policies.
The best way to enforce most policies is by encoding them in scripts
that implement them directly. Where that is not possible, scripts
might verify that they are obeyed. Many policies can be enforced only
by inspection and auditing.
Even with carefully designed policies, large issues arise. For example,
when many programs depend on one library or interpreter, some may need
a particular version of it. When you add a new program to your system
that also depends on a particular version, it's only by good luck that
the new program expects the _same_ version. (This is most frequently
a problem with proprietary, binary-only programs that cannot simply be
recompiled, and patched as necessary, to use the same version.)
If your package installation tool doesn't cope well with having extra
versions of the "same" library or interpreter, you must override it
when you install the extra version. Each time you do this, you risk
introducing subtle problems. As the configuration of your system
exceeds the package tools' understanding of it, you find yourself
increasingly involved in decisions that nobody has considered very
carefully. The result is a fragile system: each time you add or
remove a package, or update the packages already on it, you risk
breaking something.
If the package tools and policies do cope well with extra versions,
you never need to "force" them. Curiously, most of what is needed
for this is just for libraries and interpreters with incompatible
versions to have different package names (e.g. perl4 vs. perl5), and
for the programs that depend on them to name the version they were
built for. Despite the simplicity of this solution, none of the
"rpm"-based Linux distributions have adopted it yet, although they
could do so any time. To fix it is more a matter of sound policy
and plenty of hard work than of tricky technology.
With a sufficiently thorough package system, adding, removing, and
updating correctly-built packages is safe enough that it can mostly
be automated. Then, it may be reasonable simply to ask that any newly
available versions of packages on a system be downloaded and updated
in place, without shutting down and rebooting. Incremental updates
can (almost!) be part of routine maintenance. (No distribution is
entirely there yet, but most are nowhere close.)
Release Cycle
Sound policies don't only affect how packages are built. When should
a newly-released version of a program become part of the distribution?
New releases normally have fixes for various reported and unreported
bugs, and often have new features and new, unknown bugs. All too often
new bugs turn out, in time, to be worse than the old ones, and although
those are usually fixed quickly, if stamped onto CDs they can cause
problems for a long time.
It is often prudent to remain a version or three behind the latest
release of a program, and let the most eager users discover the bugs
before you commit to putting it on the CD. The version chosen will
have known bugs and missing features, but that may be better than the
alternative, which is not-yet-discovered and possibly damaging bugs.
The effectiveness of policies, and of processes for resolving conflicts,
determine the quality of a release. If the release is tied to an
inflexible schedule, or insufficiently supported by package maintainers,
quality suffers. If release dates slip too much, users may be obliged
to bypass the release and run pre-release or unstable development
versions of packages instead, subverting the purpose of the release.
Security
It is a most unsettling experience to have your work machine broken
into and trashed. Most of us have been spared, so far. The odds
that this can continue depend mostly on your own behavior (e.g.
avoiding telnet), but even perfect care is not enough if your
distribution opens back doors you don't even know about.
Security is the classic example of a quality that cannot be tested.
The only way to get it is by scrupulously following sound policies
and procedures, and the only way to verify it is by careful auditing.
(Reviewers rarely mention it at all, maybe because auditing is
suspiciously like work.) Since good security takes hard work, and
often conflicts with immediate convenience, many distributions
routinely neglect sound security policies.
The most basic security policy of all is: "minimize exposure". The
only machines vulnerable to exploits of any particular program are
those running it. Most machines have no reason to be running most
services, but many distributions install them and turn them on for
no better reason than to avoid having to tell users how to start them
up if they really do need them.
Of course, you never know where the next hole will be found, and we
all need to run at least some services on our machines. Therefore,
the next policy is "keep current". Keepers of a secure distribution
need to respond quickly to discovered holes, announce fixed versions
of vulnerable packages, and make it easy (and safe!) to upgrade. The
last point depends, again, on sound package management.
Choosing a Distribution
How can you choose a distribution that will serve your needs well?
The most important consideration, as for any software choice, is
whether you can run the programs you need. Does a distribution
include packages for the programs you expect to run? Packages
constructed by "third parties" may cause problems by violating
(or simply ignoring) distribution policies.
Some proprietary programs are only warranted to work with certain
commercial distributions, and if you want support you must run a
configuration they warrant. In turn, some commercial distributions
"encourage" proprietary software vendors to warrant only their one
distribution. This tactic backfires when, with each new release,
you are prevented from upgrading. (Software meant to run only on
one configuration is dangerously fragile; are you sure it is the
right software for you?)
For some machines with very new hardware, it may be that only a
distribution that keeps up with the latest "pre-release" software
versions will include drivers for it -- if indeed any do. This may
be an issue if you have not chosen your equipment conservatively,
and are not prepared to download and build drivers yourself, and
you don't have anybody to help with it.
If your needs are not artificially constrained, then you can consider
what your long-term experience will be. In considering a distribution,
you should see if it
-
has identified reliability and security as primary goals;
-
has instituted the policies and procedures necessary to
achieve those goals;
-
has a big enough library of officially packaged programs that
you will rarely need to install an unofficial package;
-
has a large enough staff of maintainers to ensure that
policies have been observed on all the packages they
maintain;
-
comes with package management tools that make it easy to keep
your machine equipped with up-to-date versions of all the
programs you want to run;
-
is owned and controlled for the benefit of its users, rather
than for absentee stockholders or ambitious business managers;
- is chartered to observe and uphold industry standards, rather
than subverting them to help lock you into one environment.
Among Linux distributions, only the Debian Project's "Debian GNU/Linux",
and products based on it, such as Stormix, satisfy many of these
criteria.
[How do the BSDs fare? What should be in the list, but isn't?]
No distribution is bug-free, nor will any ever be. Furthermore, even
the best policies cannot cover all cases, and areas not covered surface
all the time. Nobody can guarantee you a good experience, but you can
markedly improve your chances by choosing a distribution built by
people for whom a good experience is their main goal, and who do
everything they can to ensure it.
* - * - *
See also:
* - * - *
So, what's lacking? What BSD examples would help? Should it
go into politics and license issues, or steer clear? Should it talk
about specific packaging advances, such as Debian Apt, and the myriad
unique packaging policies Debian has adopted (such as...) that were
necessary to make Apt actually useful? What key insights are entirely
missing? What URLs should be in the References section?
(Thanks to those who replied to the request for comments I posted
in a diary entry some time back.)
This DRAFT is © Copyright 2000 by Nathan C. Myers. I hope to
put the final version under something like the GFDL.