Proofs
I got some very nice email from Konrad Slind and Ken Friis Larsen
about how to make portable proofs, and about HOL. HOL stands for Higher Order
Logic, which basically means you can quantify over functions and
predicates. You can't do this straightforwardly in first-order logics.
Given this power, you can make nice abstract proofs that don't depend
on any particular model. Basically, your theorem is of the form "for
all systems of numbers that satisfying these axioms: ...". If you want
to apply the theorem to a specific model, you just plug it in, and
prove each of the axioms as a theorem. The general technique is called
"theory interpretation". In its most general form, it's a
transformation of abstract proofs into concrete proofs expressed in
terms of a particular model, but in a nice system you can do the
deduction directly, by taking one instance of the (polymorphic)
universal quantification. Theory interpretation is one of the ideas in
QED.
HOL is based on ML, which is really a very nice language for this sort
of thing. I'm not yet convinced that the whole business of "tactics"
and "tacticals" are essential in a proof interchange format, but they
do seem to be a good way to partially automate the process of
producing proofs.
Document formats
jfleck took my bait. Ok, I will explain why I think
structural markup is not always the best approach.
In the simplest model, an author creates a document in a document
format, then transmits the file (now a sequence of bytes) to a
recipient. The recipient is in possession of a "viewer", which is a
transform from the document format into a human-readable
presentation. The usual viewer displays the document on the screen
or prints it out on the printer, but it doesn't have to be that
way. In particular, blind people will generally want the text read
aloud.
So now we have the issue of variability of the viewer transform. Some
document formats (PostScript, say) nail down the presentation quite
precisely, while others (HTML, say) might appear very different in
different viewers. Which is better?
Many people take a religious position on this issue. But I prefer to
look at it in the framework of presentation quality. You have a
viewing context: paper vs screen vs audio, window size (or screen
size), color vs monochrome, etc. These are all legitimate sources of
variability. You also have gratuitous variability, very commonly
different sets of available fonts. You can solve this variability a
number of ways: restricting the allowable fonts in a document, using
freely licensed fonts, or allowing "embedding" of the font's glyphs
(but not the complete font) in the document. The latter option is
interesting because it restricts editability.
Now you can ask the question: over the space of viewing contexts, what
is the quality of the presentation. As always, there are many aspects
to quality: aesthetic beauty, ease of reading text, ease of finding
something in the text, consistency, etc. The art of graphic design is
all about making a high quality presentation. A typical rule of thumb
is to place a graphic either on the same page or the facing page as
the text which describes it. But the rules of thumb often conflict,
and it's a judgement call to decide which ones are more important.
And now we can address the question of structural markup vs a document
format that emphasizes presentation. Especially for documents that fit
the "structured" model, the former can produce reasonably good
presentations across a wide variety of viewing contexts. PostScript,
by contrast, can represent a stunningly beautiful presentation, but
only in a very narrow context.
There are, of course, many other factors that affect quality. Of
popular document formats, HTML is particularly bad when printed. It's
good enough on the screen, though. In fact, it could be better
for screen viewing than PostScript or PDF, because it could use
screen-optimized fonts, while the latter often forces
printer-optimized fonts onto the screen. In practice, though, Web
browsers haven't spent as much attention on rendering quality as, say,
Adobe Acrobat, so the advantage is only potential.
Thus, I hope I've conveyed why I don't think that pure structural
markup is always better than pure presentation, or vice versa. The
main trend seems to be toward hybrid approaches. For example, PDF 1.4
allows you to specify one layout with great precision, but also
includes parallel more-or-less structural markup so that it can
reflow the text for small windows, etc. This adds considerable
complexity, but one can make the argument that it serves readers of
the document well. Similarly, the purity of structural markup is
often bent, incorporating directives and hints with the intent of
improving the presentation (style sheets are one such technique).
However, these hints almost always fall short of pinning down the
exact layout of text on the page, so you really can't do as well in
print as a presentation-centric approach.
word2pdf
The discussion of document formats reminds me of a wishlist item I
haven't mentioned here before: a batch converter from Microsoft Word
format(s) directly to PostScript or PDF, faithfully reproducing
the original layout. Obviously, wvware goes part of the way, but the
print output goes through TeX, so the formatting gets pretty badly
mangled.
Doing it as a batch converter is a lot easier than a full GUI word
processor. That makes it realistic to focus on faithfulness. Over
time, the existence of such a batch renderer would be helpful for
building high quality GUI editors, but in the near term, a lot of the
time people just want to view or print the documents anyway.
I'm pretty sure that we (Artifex) could sell such a tool, as it fits
in with the Ghostscript business, and we've already seen some customer
interest. So, if anyone out there is seriously interested in working
on something like this, get in touch.
(also see wvWare2
design document)