Obfuscated code
I'm continuing to struggle with integrating Well Tempered Screening
into Ghostscript, although my recent progress is encouraging. A large
part of the problem is the difficulty of the Ghostscript runtime.
Believe me, trying to do objects-in-C in the Gtk+ framework is
positively straightforward compared to Ghostscript.
Take "device colors", for example. The typedef and struct are in
gxdht.h, the allocation is done in gsstate.c (for persistent
allocation; temporary, stack allocation is done in gshtscr.c), the
constructor is split into two parts, one in gsht1.c, one in gsht.c,
and you have functions all over the place (gxht.c, gxcht.c, gxhtbit.c,
gxdevndi.c) accessing the fields directly, rather than going through
virtual methods. Then, of course, you have the marshal and unmarshal
methods in gxclrast.c.
Part of the reason for this complexity is that the code is trying to
optimize the amount of allocation that gets done. Since the halftone
is allocated as part of the graphics state rather than separately, you
can install a new halftone without actually allocating anything. Of
course, it's more complicated than that because of gsave and grestore;
when you restore the graphics state, you get the old halftone back, so
the first time you set a halftone after a gsave, you do have to
allocate.
I personally consider this a massive instance of premature
optimization. A malloc()/free() roundtrip is about 250 cycles. Saving
an allocation every pixel is worthwhile, but installing a new halftone
is relatively rare (likely on the order of once per job), so the
actual time saving is immeasurable.
Even if you did want to optimize the allocation away, the pattern of
separating allocation and construction is not the best way to do it.
Far better is to have a free pool, so that in the common case an
allocation is a handful of cycles, but you still get a clean "new"
API for creating the object, rather than having the client responsible
for allocation.
At some point after 8.0, I'm going to write up a set of new guidelines
for better runtime design, with a focus on code clarity. I am now
convinced that it's possible to improve code clarity massively
without sacrificing performance at all. One important tool will be
patterns with roughly similar performance but with a cleaner
interface. Free pool vs. exquisitely hand-tuned allocation is a
perfect example. Also, there are good tools such as inline functions
that weren't available when Ghostscript was started.
A move to C++ is also on the table, but I think it's possible to get
most of the benefits by careful design of the runtime in C. C++ would
open lots of questions, including whether or not to use the STL,
whether reference counting is to be done with smart pointers, and so
on.
In any case, I think the opacity of Ghostscript's code is a big part
of the reason why you see new PDF renderers popping up. Ghostscript
actually solves the problem rather completely (including funky color
spaces and all that). But it's always fun to start a new project.
Async thoughts
The online discussion of asynchronous programming continues, with new
posts by David
McCusker and Jeff
Darcy.
Let's take a big giant step back. Asynchronous programming adds a lot
of complexity. Thus, there better be damn good reasons for it, or we
wouldn't bother. What are those reasons?
It depends on the application. In many cases, the answer is purely
quantitative. You want to usefully use time that would have otherwise
been spent waiting for synchronous requests to complete. If you bail
and do it the synchronous way, the system will have same basic
functionality, but performance will start to suck.
In other cases, the problem demands asynchrony. Many of these problems
have a simulation flavor. You can't just process all the state changes
for one object and them go on to the next. The objects interact, so
you need to interleave them somehow. A good example is video games.
Somewhere in between, you have problems that in theory don't need
asynchrony, but in practice the performance difference is so large as
to be qualitative. For example, these days it is a basic requirement
of web servers to be able to serve multiple connections concurrently.
(Note that some very early
web servers only did one at a time).
I think you see different styles of providing asynchrony because of
the different needs. One style is focussed on performance. In
particular, you really need threads to take advantage of SMP machines
(which, in turn, are quite rare in contexts where performance is not
essential). Here, you're willing to take the hit of more complex code
if performance increases enough. In particular, you may be willing to
give up simplicitly in the mapping between the problem and the code to
solve it. Squid is probably
one of the best known exemplars of this style.
Other times, though, programming with asynchrony is the clearest
way to encode the solution to the problem. To me, a clear example of
this approach is CSP and the languages inspired by it. Of these, Limbo is
the most fully developed. It's worth noting that the Limbo
implementation is essentially single-threaded (multiple threads can
individually block on I/O, but only one interpreter loop runs at a
time), and is a slowish interpreter, so you can't really make the case
that performance is the driving goal.
A more pedestrian example is the ubiquitous event loop found in video
games and many GUI's. It evolved because it was the simplest, most
direct way to express asynchronous behavior.
Composition of async modules
Jeff Darcy's overview
of the tradeoffs between event-based and multithreaded approaches to
asynchrony touches on one point which I feel is key: the ability to
compose a larger system from smaller modules. Composition techniques
we take for granted in synchronous systems, such good old-fashioned
function calls, run aground in asynchronous systems.
In a pure event-based system, you can't call a procedure which might
block. This limitation can be crippling. Often, you relax a bit and
allow some operations to be done as blocking procedures. In classic
Unix, the open() call is only available in blocking flavor, so when
you need to open a file, you just call it and wait. Similarly, the
classic gethostbyname() function has been a terrible source of
blocking in Web browsers.
If what you're doing is mostly procedure calls, then threads can be a
win. Go ahead and call a function which might block. If some other
thread can make progress, it will. However, there are still a lot of
gotchas. For one, you have to get your locking discipline right. It's
easy to introduce subtle race conditions, which are a total pain to
debug. Even more fundamentally, you have to write code in a
thread-safe style. The classic Unix API's again fall flat, because of
such things as errno being a global variable. These days, most
people try to write thread-safe code, though.
Another thread gotcha is scaling. If your architecture forces you to
have an operating system thread per connection, or a thread per
simulated object, or whatever, you lose. Why did you choose the
asynchronous route in the first place? If it's solely performance,
then running up against scaling limits is not what you wanted.
Thus, a lot of the interesting work on asynchronous programming is
about hybrids of the pure event and thread models. Matt Welsh's SEDA is one
such, with a refreshingly clean quantitative focus. But it's
interesting for another reason. The "stages" are relatively loosely
coupled, which means that you might compose stages from different
modules into one big app. In fact, the Haboob web server already has a
mechanism for mixing in Jython scripts.
My feeling is that you could do something very similar with CPython.
For performance, you'd want to write the core async runtime in C (to
avoid Python's global interpreter lock), but then you could more or
less freely intermix stages written in C and Python. The C ones could
take care of your low-level, performance intensive stuff like the
network wire protocols, crypto, and caches, and the Python ones could
take care of your higher level application logic. I find this the most
compelling architecture for performant P2P applications. Most
importantly, you could start by writing everything in Python, then
migrate to C as needed.
It might even be possible to provide a uniform Python API for
both the Java-based and C-based SEDA implementations. Such a thing
would be a fairly dramatic demonstration of cross-runtime programming.
more: Dirt and Noise, Trust, Jokes