Recent blog entries for pjrm

Currently working on reducing differences between three recent related codebases to do with text layout:

  • Table layouter: choose column widths & row heights that make a table fairly compact, which tends to make the table look nicer and easier to read.

  • Text-in-shape: Have text that flows around figures, even if those figures are non-rectangular. Uses whole-document optimal line-breaking. Allows making the text ~exactly fit into the available space (like you want in a newspaper or magazine) by finding the value of some “size” parameter that controls how big the text is relative to the available space: which might increase the size of the figures, font size, leading, move some of the figures further into the text, or whatever the designer wants. (We haven't written a gui for specifying complex parameterizations, just the software for finding the parameter value for given text.)

  • Optimal float placement in multi-column documents. I.e. placing each float (figure, table, sidebar or the like) so that it's near the text that refers to it, while satisfying some layout constraints notably including “doesn't overlap other floats”. (The fact that floats (and text) can't overlap each other is what gives the problem its computational complexity.)

Each of these uses some subset of XHTML as its input format, but with different capabilities depending on what was important or difficult for each project:

  • The table code has very limited facilities for styling text, and the input document must consist solely of a single table.
  • Neither the text-in-shape code nor the multicolumn float-placement code handle tables. (Nor does the table code handle cells containing a nested table.)
  • The multicolumn float-placement code recently acquired support for a CSS stylesheet (using libcroco) and simple support for bullets.

So I'm now reducing differences between the codebases of these three: it would be nice if all of them supported CSS and all of them supported tables and all of them supported paragraph shaping like bullets, and so on.

I'm currently working with Nathan Hurst on improving table layout: how to choose column widths & row heights that minimize wasted space (or get more readability from a given amount of space).

The HTML standard includes one specification of how this is to be done, though its results are in many cases displeasing.

It may be difficult to get improvements into newer HTML standard (given the number of browsers needing updating), though we may well be able to use the improved approach in software that isn't bound by standards for table layout, e.g. word processors and spreadsheets. ("Word processors" could even include HTML editors: the HTML editor could give hinted row/column sizes as percentages if the user was unhappy with the default assignment given by the HTML algorithm.)

I'm quite happy with the improvements we've made, but I'd like more ideas about how to have these improvements used.

I've recently been looking at using libcroco from Inkscape. I'll need some help in hooking it up properly (e.g. ensuring that the picture updates whenever stylesheets in the document are modified), but at least I can do the main work of interfacing with libcroco.

As a result of looking at libcroco, I've noticed a few bugs, so I'm working on fixing those and updating the test suite so that it catches those bugs.

I've also noticed that there are a number of things about style that aren't specified in the SVG spec, e.g. the priority of style="..." attributes compared to other stylesheet information. My guess is that it's considered more specific than any separate stylesheet rule.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!