1 Jun 2002 raph   » (Master)

Printing

I had lunch with, then spent the afternoon with Brian Stell of Mozilla. He is on a mission to make Mozilla printing work well on Linux, especially the i18n bits. Already, he's provided us with fixes and test cases for incremental Type 11 fonts.

There are a lot of tricky issues, and the GNU/Linux community has historically done a poor job dealing with printing. It's an area where cooperation between many diverse pieces is needed, but there's nothing that's really motivating a solution except people's frustration. Brian is trying to solve the problem the Right Way for Mozilla, with the hope that others might follow.

Among other things, Brian is trying to figure out which versions of PostScript and PDF to target. For compatibility with existing printers, you want to generate lowest common denominator PostScript. But there are also advantages to generating more recent versions. For example, you can drop alpha-transparent PNG's into a PDF 1.4 file and they'll render correctly. That's not possible with any version of PostScript, or with earlier versions of PDF. On the font side, many (most?) existing printers can't handle incremental Type 11 fonts, even though they're in PostScript LanguageLevel 3 and also many Adobe interpreters before that (2015 and later).

A good solution would be to generate the latest stuff, and have converters that downshift it so it works with older printers. Alas, no good solution exists now. Ghostscript can rasterize without a problem, but sending huge bitmap rasters to PostScript printers is slow and generally not a good idea. pdftops can preserve the higher level structure (fonts, Beziers, etc.), but is limited in many other ways, among other reasons because it doesn't contain an imaging engine. So, at least for the time being, it seems like the best compromise is to have a codebase that generates various levels of PostScript and PDF.

A chronic problem in for GNU/Linux is a mechanism for users to install fonts, and applications to find them. At least five major application platforms need fonts: Gnome, KDE, Mozilla, OpenOffice, and Java. You also have a number of important traditional (I don't really want to say "legacy") applications that use fonts: TeX, troff, and Ghostscript among them. Plenty of other applications need fonts, including all the vector graphics editors, Gimp, and so on. I suppose I should mention X font servers too.

Most applications that need fonts have a "fontmap" file of some kind. This file is essentially an associative array from font name to pathname in the local file system where the font can be found. Actually, you want a lot more information than just that, including encoding, glyph coverage, and enough metadata at least to group the fonts into families. In some cases, you'll want language tags, in particular for CJK. Unicode has a unified CJK area, so a Japanese, a Simplified Chinese and a Traditional Chinese font can all cover the same code point, but actually represent different glyphs. If you're browsing a Web page that has correct language tagging, ideally you want the right font to show up. Unfortunately, people don't generally do language tagging. In fact, this is one area where you get more useful information out of non-Unicode charsets than from the Unicode way (a big part of the reason why CJK people hate Unicode, I think). If the document (or font) is in Shift-JIS encoding, then it's a very good bet that it's Japanese and not Chinese.

This is why, for example, the gs-cjk team created a new fontmap format (CIDFnmap) for Ghostscript. In addition to the info in the classic Ghostscript Fontmap, the CIDFnmap contains a TTC font index (for .ttc font files which contain multiple fonts), and a mapping from the character set encoding to CID's, for example /Adobe-CNS1 or /Adobe-GB1 for Simplified and Traditional Chinese, respectively.

To make matters even more complicated, as of 7.20, we have yet another fontmap format, the xlatmap. The goals are similar to the CIDFnmap, but with different engineering tradeoffs. One of my tasks is to figure out what to do to unify these two branches.

In any case, there are really three places where you need to access fonts, and hence fontmaps. First, the app needs to be able to choose a font and format text in that font. That latter task requires font metrics information, including kerning and ligature information for Latin fonts, and potentially very sophisticated rules for combining characters in complex scripts. These rules are sometimes embedded in the font, particularly OpenType formats, but more often not for older formats such as the Type1 family. Interestingly, you don't need the glyphs for the formatting step.

The second place where you need the font is to display it on the screen. Historically, these fonts have lived on the X server. But the new way is for the client to manage the font. The XRender extension supports this approach well, as it supports server-side compositing of glyphs supplied by the client. Even without XRender, it makes sense to do the glyph rendering and compositing client-side, and just send it to the X server as an image. Maybe 15 years ago, the performance tradeoff would not have been acceptable, but fortunately CPU power has increased a bit since then.

The Xft library is one possible way to do font rendering, but I'm not very impressed so far. Among other things, it doesn't do subpixel positioning, so rendering quality will resemble Windows 95 rather than OS X.

The third place where you need the font is when you're printing the document. In most cases today, it's a good tradeoff for the app to embed the font in the file you're sending to the printer. That way, you don't have to worry about whether the printer has the font, or has a different, non-matching version. If you do that, then Ghostscript doesn't actually need to rely on fontmaps at all; it just gets the fonts from the file. However, a lot of people don't embed fonts, so in that case, Ghostscript has to fudge.

So how do you install a font into all these fontmaps? Currently, it's a mess. There are various hackish scripts that try to update multiple fontmaps, but nothing systematic.

One way out of the mess would be to have a standard fontmap file (or possibly API), and have all interested apps check that file. Keith Packard's fontconfig package is an attempt to design such a file format, but so far I'm not happy with it. For one, it's not telling me all the information I need to do substitution well (the main motivation in Ghostscript for the CIDFnmap and xlatmap file formats). Another matter of taste is that it's an XML format file, so we'd need to link in an XML parser just to figure out what fonts are installed. I'd really prefer not to have to do this.

I realize that the Right Thing is to provide enough feedback to KeithP so that he can upgrade the file format, and we can happily use it in Ghostscript. But right now, I don't feel up to it. The issues are complex and I barely feel I understand them myself. Also, I'm concerned that even fixing fontconfig for Ghostscript still won't solve the problems for other apps. After all, Ghostscript doesn't really need the font metrics, just the glyphs. Even thoroughly obsolete apps like troff need to get .afm files for Type1 fonts (much less font metrics from TrueType fonts). GUI apps on GNU/Linux haven't really caught up to the sophistication of mid-'80s desktop publishing apps on the Mac. As far as I can tell, fontconfig currently has no good story for metrics or language info.

What would really impress me in a standard for fontmap files is a working patch to get TeX to use the fonts. But perhaps this is an overly ambitious goal.

In any case, I really enjoyed meeting Brian in person today, and commend him for having the courage to attack this huge problem starting from the Mozilla corner.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!