24 Dec 2002 mbp   » (Master)

Threads

The other thing I should have said earlier is that of course sometimes ugly performance hacks are the only way to get the job done using the tools available. So for example for Apache to use threads on NT is a necessary concession to the poor fork implementation on that platform.

What most recently got me thinking about this was the internal Microsoft whitepaper on MSSecrets in which they admint that implementing IIS as shared-everything threads was an enormous mistake.

I fairly often attach gdb to a single Apache process to see what's going on. Since the process handling a single TCP connection is pretty much isolated from all the rest, this is quite straightforward and it doesn't interfere with anything else on the machine. The writer complains that this is impossible on IIS, because it would jam up all other threads in the process.

Similarly, if a particular process dies because of a bug it doesn't necessarily affect anything else.

pphaneuf, I had the impression that Ulrich might have said that in private conversation with bje, but I will check later. (Unless one of them responds here. :-)

MichaelCrawford, the thing about "using SMP" is that nobody really wants to just "use SMP" unless they're a "how about a beowulf cluster of those" slashdot weenie. People want to get a task done more quickly. We have to ask first of all, is the task parallelizable, and how? For example, if the system wants to handle incoming network requests, then you can do that using either threads or isolated processes. Or if you have a lot of data to digest you can divide it up and work in parallel.

What I'm asking about is how a user program can do SMP via state machines without the use of threads. Saying to run two state machines in different processes isn't the right answer. That's the same as using two threads and presents all the same difficulties.

Well, I would say that it presents many fewer difficulties: the processes are isolated and so don't affect each other if they crash, they can be debugged separately, etc. As pphaneuf points out, shared-everything threads will possible cause more SMP contention than processes that use special mechanisms to share only what is necessary.

I think things like tridge's tdbs that provide a simple safe abstraction on top of shared memory are an advance in this direction. So too are rusty's futexes (fast user-space mutexes): they give you mutual exclusion and rescheduling *faster* (IIRC) than most thread implementations, even if you're using processes. (Incidentally, rusty and tridge will both be at linux.conf.au.)

If the only way to represent your problem is as a single tightly integrated state machine then that suggests that perhaps it is not parallelizable at all.

lukeg, I think what Alan was getting at is that there is no getting away from the fact that mainstream CPUs *are* state machines. (They have registers, a PC, etc.)

Since consensus is no fun, let me suggest that both threads and state machines have advantages and disadvantages,

I didn't mean so much to structure programs explicitly as state machines, but rather to suggest that data should be private by default and shared where there is a good reason, rather than the shared-everything model used by threads in C. I think often only a few data structures will need to be shared to get an appropriate degree of parallelism.

I don't know Erlang as well as I would like, but I suspect lazy functional languages are more or less an exception to the idea of threads being bad, because they're not something the programmer deals with directly.

By the way, Squid is a fascinating example of continuation-passing in C, because it wants to do select-based async IO without using threads. It's clever, though I think it demonstrates C is not well suited to the problem.

Thanks for the pointer to Communicating Sequential Processes. I'll look out for it.

Perhaps you'd like to post a precis of how threads are used by Erlang?

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!