Older blog entries for pphaneuf (starting at number 314)

10 Mar 2007 (updated 18 May 2008 at 07:14 UTC) »

Message passing? Yes!

[info] lkcl posted a reply to my previous post...

Oh, lkcl, you're a bit of a pessimistic, there. unlink is also atomic, not just rename. ;-)

But what I'm looking after right now isn't just a general message-passing operation (I already do that at a higher level, between various subsystems, using shared memory for local or sockets for remote recipients), more specifically the best way to take advantage of multiple cores as much as possible to process file descriptor events.

You have to cut the libc people some slack, though, they don't have locks around every single functions. I'd be amazingly appalled if strncmp() took a lock, for example! And as I described, I already used processes and shared-memory for higher-level concurrency, this threading is only for the very lowest level, when I have no choice and I'm pushed to the edge. Note that the code is already written as a state machines, using the threads to run more than one state machine at once (there is one state machine per connection or so, more or less). I'm planning on having a lock on the state machine instance, so that a single state machine cannot be executed on more than one thread at once, so that the code inside of it can make that assumption (that still allows me to handle multiple independent connections at once).

On Linux, epoll already does that in an atomic way, for me, in its edge-triggered mode. You can just have multiple threads call epoll_wait(), a given event will be sent to only one thread, until it is re-armed. But the specific requirement here is "compatibility layer for portability to other POSIX platforms", hence the use of crummy old select(). Of course, if I need high-load scalability on some big iron, I'll be sure to not use that layer, and go for epoll, kqueue() or something like that!

That said, I'd like a link to that message-passing work, it sounds interesting.

Syndicated 2007-03-10 18:49:00 (Updated 2008-05-16 18:47:53) from Pierre Phaneuf

Insight of the day

I must be very silly, but I just realized that a Unix pipe is a semaphore. It's better in some aspects (is select()able) and worse in others (SEM_VALUE_MAX is lower). Cool.

In that context, the "signal handler that writes to a pipe" trick makes a new kind of sense (the semaphore equivalent, sem_post(), is the only synchronization function that is safe to use in a signal handler).

Syndicated 2007-03-08 12:34:38 (Updated 2007-03-08 12:38:16) from Pierre Phaneuf

7 Mar 2007 (updated 8 Mar 2007 at 09:05 UTC) »

select() and a thread pool

Here's the challenge: a select()-based event dispatcher that can scale from single-threaded to multi-threaded efficiently, preferably as simple as possible and able to be effectively edge-triggered (even if select() itself is level-triggered).

Note that the point here is to run the event handlers concurrently. I've used threads in the past to work around deficiencies in select(), putting "quiet" file descriptors on one thread that would sleep most of the time, and "busy" file descriptors on another (since select() scales based on the number of file descriptors it watches, this made the active set more efficient), but it would still only use one thread for event handlers. It was just about sleeping more efficiently.

My first idea was that all threads would be symmetrical and would thus all sleep in select(). But that doesn't work, obviously, because as soon as one file descriptor is ready, all the threads wakes up, which is rather wasteful. Splitting the set of file descriptors between the threads isn't so good, because a scheduler would then be needed to balance the load between threads, and I prefer to leave scheduling to the kernel, as much as possible. On the other hand, this could allow better cache locality, handling the same file descriptor on the same thread (possibly bound to the processor) as often as possible, but on the third hand, let's not get ahead of ourselves.

It seems that the only way is to keep all the file descriptors together, with just one thread calling select(). When it returns, it should be examined and all the events put on a list. Then it goes into a loop of taking an event from the head of the list, posting the semaphore of the next thread (which could be itself in the single-threaded case), calling the event's handler, then waiting on the semaphore. If there was no event left, it goes back to select() instead.

There's a few good bits and a few bad bits about this design. A good bit is that the semaphores that keeps the other threads sleeping also protect the list at the same time (that's why it's posted after taking the next event). A bad bit is that at the end, we would be going into the select() before all the handlers are done, and they might want to add some file descriptors. This could be fixed by having a counter of threads busy running handlers, and the last thread that would be out of events would be the one going back into select(), but this would also make the load more "bursty" in a busy server, where there's really work all the time, but at every round of select(), all the threads but one would go to sleep, only to be reawakened.

I think, in the end, that the usual Unix solution comes to the rescue: add a file descriptor! Having the reading end of a pipe in the select() invocation, and adding a file descriptor from a handler would write to the writing end, waking up the select() as needed. A bit of a shame, since it would be unnecessary in the single-threaded case, but oh well, that's how it is sometimes...

Anyone has suggestions for improvements? They are most welcome!

Syndicated 2007-03-07 18:34:43 (Updated 2007-03-08 08:44:17) from Pierre Phaneuf

5 Mar 2007 (updated 18 May 2008 at 07:14 UTC) »

Edge- Versus Level-Triggered Events

A good while ago, I declared my preference for edge-triggered events (for I/O), because in my mind, it most closely paralleled the actual workings of the hardware (IRQs being triggered when new data arrives on network interfaces or when a read/write request to disk is completed). I also figured that since it was possible to emulate level-triggered efficiently on top of edge- (that's what happens when you use close-to-the-metal "abstractions"), it probably was the more flexible choice.

I then changed my mind, when I realized that what is very often needed is level-triggered, someone has to remember what file descriptor still has work to do. I figured that it might as well be the kernel, since it is in the best position to do that safely, simply and efficiently. Otherwise, you end up either having to remember myself (which isn't very safe if your framework is only providing the event delivery mechanism, a buggy user could easily "get lost"), or to re-arm the file descriptor (more system calls, less efficient). On the other hand, there were also some other relatively common usage where edge-triggered was preferable (specifically, when transferring data from one file descriptor to another, where you do not want to re-arm the source until you managed to write the data to the sink first.

But recently, I changed my mind again, and about a certain number of things. Many know me to be one who
dislikes threads, but it's not exactly true, I have a dislike of how it's haphazardly used as magic pixie dust by hordes of people who are apparently utterly confused by event-driven state machines. Now that we're seeing more and more multi-core systems, I feel something has to be done about it. And it turns out that level-triggered events are a bit of a pain to handle with multiple threads: a thread going into epoll_wait could get a notification that a file descriptor is readable while another thread is in the process of dealing with it. Adding a monitor to prevent re-entrance would just make it busy-wait instead, as it wouldn't sleep. Edge-triggered events deal with this neatly, and I think this combines with the existing cases where it was preferable to make them actually the better choice.

Now, I want to have a select()-based reference implementation, for portability, and it turns out it's kind of tricky to have multiple thread service a common set of file descriptors... I have some ideas, but that'll be for next time.

Syndicated 2007-03-05 19:21:21 (Updated 2008-05-16 18:44:27) from Pierre Phaneuf

4 Feb 2007 (updated 6 Feb 2007 at 11:11 UTC) »

Cheap booze and C++

We drank cheap sparkling wine the other day. When I say cheap, I mean 0.87 euro for a bottle. That's 1.34 CAD at the current rate.

I am also hacking on a modern C++ implementation of property lists. I am also getting rather hooked by some aspects of Boost. The binder is astonishingly clever and asio looks very promising. I'm also told their boost::function does not use a virtual method (unlike my attempt, WvCallback), which I'll have to look into.

No, there is no relation between the drinking and the hacking. :-P

Syndicated 2007-02-04 12:57:54 (Updated 2007-02-06 10:40:12) from Pierre Phaneuf

Musical technology

Ok, so I in my last post, I was saying that software patents aren't too evil, that DRM still is quite evil (here a related link to Tim Bray, who's talking about the famous Linn company putting out DRM-free, higher-than-CD-quality music, props to them!), and so that I am switching to MP3.

I had been doing some research, so I figured I'd share some of my resources, while I'm there.

I found a ton of great information on the HydrogenAudio knowledge base, about which encoders are best, the pros and cons of various lossless formats (those harder to find CDs I also ripped to FLAC, for archival), and such things. For example, whether there was any issues with FLAC (apparently, the biggest thing is that you can't put RIFF chunks in them, but I don't think that should be an issue), or whether Ogg FLAC was it now (not really, only for streaming and other special cases, and you can convert from one to another very quickly). Seems like LAME is pretty much the state of the art for MP3 encoding, and that even the latest version of ID3 tags still kind of suck compared to Ogg Vorbis comments (thankfully, my ripping software, Max, keeps a full superset of all the meta-data in its own files).

Speaking of which, I have to recommend Max, it's a great piece of free software, very flexible and with a number of useful abilities, such as encoding to multiple formats in parallel from a single rip (I did some comparison testing between FLAC, MP3 and Ogg Vorbis on some difficult tracks, to test my encoder settings).

I also encountered one of the more practical annoyances with MP3 already, where (with XMMS, at least) seeking isn't accurate. If you hear a specific bit at the 2 minute mark, restart the song and seek back to the "same place", I often found that the time display is off and that when I reach the same bit, the player says I'm 10 seconds or so past the place I was the previous time. Oh well, I don't seek too often, thankfully, but it certainly made that side-by-side testing I was doing rather annoying.

Syndicated 2007-02-04 11:14:52 from Pierre Phaneuf

Obsessive-compulsive disorder and patents

Okay, so I'm not writing a music player (yet). One could say my "faith" has weakened, where I am taking the opportunity of a re-ripping of my music library to switch from Ogg Vorbis to MP3. Yeah, yeah, I know, I'm a terrible human being, my soul will burn and everything.

A quick reminder on why MP3 is evil and how Ogg Vorbis will save your eternal soul: Thomson holds some patents on key technologies involved in the MP3 format, those technologies being in the form of mathematical concepts. Some people find the patenting of such intangible things to be stifling innovation, that many these concepts are inherent in nature, and that as such, anyone should be free to use them (some comparing this to patenting something like the Pythagorean theorem, as an algorithm to find the length of the hypothenuse of a right triangle).

After some soul searching, I feel that while this is true to some degree, it is not the most evil aspect of software patents. My issues with software patents are two-fold.

First, the way the process of obtaining those patents seems rather sketchy at best, quite regularly granting completely frivolous patents. Upon application, those often get overturned, but still, this costs money, and if I were to be sued over one, I would most likely be in deep financial trouble, no matter how frivolous the patent. In the case of MP3, I do not feel this is one of those, the thing being filled with psychoaccoustics, modified cosine transforms, polyphase quadrature filter, alias reduction formulas, and other such things guaranteed to give me a headache. These guys are no fly-by-night lawyers trying to make a quick buck, from what I can see.

Second, the duration of patents, for a low cost/revenue ratio industry (like software, as opposed to cars, which are expensive to manufacture) anyway, is quite excessive, in these days of rapid technological advances. Maybe that, yes, Pythagoras should have been granted a patent for his theorem, but the question is how soon should it have expired? Again, in the case of MP3, the oldest reference to those technologies I could find (didn't check very thoroughly!) was around 1986, which isn't shockingly old, but in terms of technology, is starting to get a little dated. I'd say that a 15-20 years expiration on that kind of patent wouldn't be too ridiculous either way, and I'm sure smart inventors would manage to make quite a bit of money in even less time.

So, in short, I'm not technically against software patents, but more against the way they are implemented right now. I suppose I also dislike the way some patent holders keep quiet about their portfolio, until everyone is using the technology, at which point they helpfully point out that every bloody living organism owes them money. Those make me angry.

In any case, for most users, patents are a bit immaterial, it's mostly for developers (especially of free software). It inflates the cost of their iPod by a few dollars, but they can't really tell the difference between that and the rest of the cost.

What's material to people right now, though is DRM, the so-called "digital rights management". Ensuring your rights are properly limited and constrained, that the rights of the poor media corporations aren't being trampled on by nasty people that want to listen or watch the content they lawfully paid for (pesky, those people!).

It's too bad this isn't being done with the arguably superior Ogg Vorbis, but compared to many of the other choices, MP3 is the choice with more freedom, compared to the PlaysForSure and FairPlays of the world. Pirates aren't being stopped, honest people get screwed and I forgot my point.

Syndicated 2007-01-29 22:10:16 (Updated 2007-01-29 22:15:00) from Pierre Phaneuf

Quote of the (last work-)day

From one of my co-workers, on internal IRC, after I once more exhibited my sketchy grasp of the French grammar:

curl --mirror http://www.leconjugueur.com/ | ssh pphaneuf 'cat >/dev/brain'

Syndicated 2007-01-29 09:08:30 from Pierre Phaneuf

305 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!