3 Jul 2002 raph   » (Master)

Obfuscated code

I'm continuing to struggle with integrating Well Tempered Screening into Ghostscript, although my recent progress is encouraging. A large part of the problem is the difficulty of the Ghostscript runtime. Believe me, trying to do objects-in-C in the Gtk+ framework is positively straightforward compared to Ghostscript.

Take "device colors", for example. The typedef and struct are in gxdht.h, the allocation is done in gsstate.c (for persistent allocation; temporary, stack allocation is done in gshtscr.c), the constructor is split into two parts, one in gsht1.c, one in gsht.c, and you have functions all over the place (gxht.c, gxcht.c, gxhtbit.c, gxdevndi.c) accessing the fields directly, rather than going through virtual methods. Then, of course, you have the marshal and unmarshal methods in gxclrast.c.

Part of the reason for this complexity is that the code is trying to optimize the amount of allocation that gets done. Since the halftone is allocated as part of the graphics state rather than separately, you can install a new halftone without actually allocating anything. Of course, it's more complicated than that because of gsave and grestore; when you restore the graphics state, you get the old halftone back, so the first time you set a halftone after a gsave, you do have to allocate.

I personally consider this a massive instance of premature optimization. A malloc()/free() roundtrip is about 250 cycles. Saving an allocation every pixel is worthwhile, but installing a new halftone is relatively rare (likely on the order of once per job), so the actual time saving is immeasurable.

Even if you did want to optimize the allocation away, the pattern of separating allocation and construction is not the best way to do it. Far better is to have a free pool, so that in the common case an allocation is a handful of cycles, but you still get a clean "new" API for creating the object, rather than having the client responsible for allocation.

At some point after 8.0, I'm going to write up a set of new guidelines for better runtime design, with a focus on code clarity. I am now convinced that it's possible to improve code clarity massively without sacrificing performance at all. One important tool will be patterns with roughly similar performance but with a cleaner interface. Free pool vs. exquisitely hand-tuned allocation is a perfect example. Also, there are good tools such as inline functions that weren't available when Ghostscript was started.

A move to C++ is also on the table, but I think it's possible to get most of the benefits by careful design of the runtime in C. C++ would open lots of questions, including whether or not to use the STL, whether reference counting is to be done with smart pointers, and so on.

In any case, I think the opacity of Ghostscript's code is a big part of the reason why you see new PDF renderers popping up. Ghostscript actually solves the problem rather completely (including funky color spaces and all that). But it's always fun to start a new project.

Async thoughts

The online discussion of asynchronous programming continues, with new posts by David McCusker and Jeff Darcy.

Let's take a big giant step back. Asynchronous programming adds a lot of complexity. Thus, there better be damn good reasons for it, or we wouldn't bother. What are those reasons?

It depends on the application. In many cases, the answer is purely quantitative. You want to usefully use time that would have otherwise been spent waiting for synchronous requests to complete. If you bail and do it the synchronous way, the system will have same basic functionality, but performance will start to suck.

In other cases, the problem demands asynchrony. Many of these problems have a simulation flavor. You can't just process all the state changes for one object and them go on to the next. The objects interact, so you need to interleave them somehow. A good example is video games.

Somewhere in between, you have problems that in theory don't need asynchrony, but in practice the performance difference is so large as to be qualitative. For example, these days it is a basic requirement of web servers to be able to serve multiple connections concurrently. (Note that some very early web servers only did one at a time).

I think you see different styles of providing asynchrony because of the different needs. One style is focussed on performance. In particular, you really need threads to take advantage of SMP machines (which, in turn, are quite rare in contexts where performance is not essential). Here, you're willing to take the hit of more complex code if performance increases enough. In particular, you may be willing to give up simplicitly in the mapping between the problem and the code to solve it. Squid is probably one of the best known exemplars of this style.

Other times, though, programming with asynchrony is the clearest way to encode the solution to the problem. To me, a clear example of this approach is CSP and the languages inspired by it. Of these, Limbo is the most fully developed. It's worth noting that the Limbo implementation is essentially single-threaded (multiple threads can individually block on I/O, but only one interpreter loop runs at a time), and is a slowish interpreter, so you can't really make the case that performance is the driving goal.

A more pedestrian example is the ubiquitous event loop found in video games and many GUI's. It evolved because it was the simplest, most direct way to express asynchronous behavior.

Composition of async modules

Jeff Darcy's overview of the tradeoffs between event-based and multithreaded approaches to asynchrony touches on one point which I feel is key: the ability to compose a larger system from smaller modules. Composition techniques we take for granted in synchronous systems, such good old-fashioned function calls, run aground in asynchronous systems.

In a pure event-based system, you can't call a procedure which might block. This limitation can be crippling. Often, you relax a bit and allow some operations to be done as blocking procedures. In classic Unix, the open() call is only available in blocking flavor, so when you need to open a file, you just call it and wait. Similarly, the classic gethostbyname() function has been a terrible source of blocking in Web browsers.

If what you're doing is mostly procedure calls, then threads can be a win. Go ahead and call a function which might block. If some other thread can make progress, it will. However, there are still a lot of gotchas. For one, you have to get your locking discipline right. It's easy to introduce subtle race conditions, which are a total pain to debug. Even more fundamentally, you have to write code in a thread-safe style. The classic Unix API's again fall flat, because of such things as errno being a global variable. These days, most people try to write thread-safe code, though.

Another thread gotcha is scaling. If your architecture forces you to have an operating system thread per connection, or a thread per simulated object, or whatever, you lose. Why did you choose the asynchronous route in the first place? If it's solely performance, then running up against scaling limits is not what you wanted.

Thus, a lot of the interesting work on asynchronous programming is about hybrids of the pure event and thread models. Matt Welsh's SEDA is one such, with a refreshingly clean quantitative focus. But it's interesting for another reason. The "stages" are relatively loosely coupled, which means that you might compose stages from different modules into one big app. In fact, the Haboob web server already has a mechanism for mixing in Jython scripts.

My feeling is that you could do something very similar with CPython. For performance, you'd want to write the core async runtime in C (to avoid Python's global interpreter lock), but then you could more or less freely intermix stages written in C and Python. The C ones could take care of your low-level, performance intensive stuff like the network wire protocols, crypto, and caches, and the Python ones could take care of your higher level application logic. I find this the most compelling architecture for performant P2P applications. Most importantly, you could start by writing everything in Python, then migrate to C as needed.

It might even be possible to provide a uniform Python API for both the Java-based and C-based SEDA implementations. Such a thing would be a fairly dramatic demonstration of cross-runtime programming.

more: Dirt and Noise, Trust, Jokes

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!