Older blog entries for fejj (starting at number 83)

For some reason I find this funny:

--> dvhart (~dvhart@cs6625202-7.austin.rr.com) has joined
#evolution
<dvhart> can anyone tell me where evolution is pulling my
calendar info from?  I delete the entire evolution folder,
start it up again and all my appointments are there.  I want
them all to go away!

you can stab it with your steely knives but you just can't kill the beast

16 Jun 2002 (updated 17 Jun 2002 at 04:36 UTC) »

to hell with what you're thinking and to hell with your petty mind, you're so distracted from the real thing you should leave your life behind

Foolishly I've begun hacking on Spruce again.

This time around, though, I'm using the JavaMail API as a reference for designing the new backend architecture for Spruce. However, since Spruce uses the Gtk+ toolkit (please disreguard the redundancy), the backend will have to block because Gtk+ is not a very pthread-friendly toolkit (which is how JavaMail achieves it's 'async' behavior). This is obviously unacceptable to users.

Possible solutions?

(I've been told that GObject is "thread-safe" by sopwith but seem to recall being told the opposite by owen. I'm going to assume that they are both "right" for the purpose of this little problem solving exercise. Here's what I think we can assume: GObject can be thread-safe supposing that you do proper locking, but the signal system is not.)

  1. My first thought was to use pthreads but avoid signals in the backend and instead write a Listener class that emulates signals by communicating with the main thread via a socket. The main thread could register an idle-handler for each Listener object and read whatever is on the socket and do whatever is the appropriate action. This sounds rather hackish so I think I'll probably avoid it.

  2. My second idea was to make the backend one big state machine. This makes design a bit more complex, but I think you can achieve a much more elegant design/API this way. This can be done in multiple ways, but here is my current idea for implementing this:

    All operations that can require file or network I/O should return a SpruceAsyncHandle object:

    abstract class SpruceAsyncHandle {
    private:
    	GObject object;
    	void *state;
    	
    	void begin (handle, object, state);
    
    

    public: int step (handle, object, state); int cancel (handle, object, state);

    signals: void finished (handle, object, user_data); void cancelled (handle, object, user_data); };

    /* client API */ int spruce_async_handle_step (handle); int spruce_async_handle_cancel (handle);

    All methods returning a SpruceAsyncHandle object will immediately call the begin method to initialise the state and then call it's parent implementation. The abstract class implementation will simply register the ::step() in an idle-handler. When either the async operation is completed or cancelled, the abstract AsyncHandler class will also take responsibility for de-registering itself from any idle handlers.

    Unfortunately this means I'll have to write a ton more code, simply because I'll now have to write AsyncHandler's for each and every method that can possibly block. Ugh.

I've got a picture in my head. In my head. It's you and me, we are in bed. We are in bed.

GMime: Finally made the 1.0.0 release tonight. I should probably make a pre-release of GMime 2.0 shortly.

Mozilla party tonight, but I didn't go... something about a Goth club didn't exactly excite me. I bet I find out the party was great and I missed out, yada yada. Oh well...

I've got a picture in my room. In my room. I will return there I presume. Should be soon.

12 Jun 2002 (updated 12 Jun 2002 at 05:33 UTC) »

bdodson: ouch, that sucks. I would have to say that a lot of Freshmeat projects are Linux-specific, so only allowing "Unix" applications sounds like BS to me.

GMime: Wow, someone actually found a bug in GMime 0.9.0 today. If you feed a non-seekable stream to the parser, the resultant GMimeMessage or GMimePart object would contain an invalid stream object as its content so writing it out would cause it to segfault.

The fix was to make the parser check to see if the stream was seekable when parsing mime part contents into the GMimePart objects and cache the content in a memory buffer if the source stream is not seekable. This had actually been on my TODO list, but I had forgotten that I had any non-seekable streams. My TODO had actually been to move the seek/tell methods into a new abstract stream class or else find some other way of defining whether or not a stream was seekable. Currently the best you can do is see if g_mime_stream_tell() returns -1, but as you may have guessed - this could just mean there was an error. Of course, I suppose that as far as the parser cares, if tell() fails but read()s work, then the stream should be considered non-seekable and just go on with parsing anyway - so this mostly just works out. Maybe I don't need a GMimeSeekableStream abstraction afterall...

GMime: Working like a madman on GMime-2.0 recently. So far, between yesterday and today, I've implemented a few new classes such as GMimeMultipartSigned and GMimeMultipartEncrypted which handle the multipart/signed and multipart/encrypted MIME types defined by rfc1847 as well as an abstract class GMimeCipherContext that has generic methods for encrypting, decrypting, signing, verifying, importing keys, and exporting keys. This class was mostly just ported from my GMIME_PGP_MIME branch (which was based on glib1) over to GObject. The only real difference is the addition of the import/export methods which will be useful for anyone (probably gonna be me) implementing the application/pgp-keys MIME type (which is only briefly mentioned in rfc2015 and rfc3156) for example. I'm sure S/MIME has a similar method, although I'm not as well versed with the S/MIME specifications so I'm not 100% sure how they go about this.

While I'm on the subject of multipart/signed, let me just say that I think the authors of this spec really screwed the pooch. Multipart/signed is completely Broken-By-Design (tm) - it must be treated completely different from any other multipart types. To work, its contents MUST be treated as opaque - but nothing in any of the mail specifications guarentee that content will go from point A to point B without modifying, in any way shape or form, the headers nor contents of a MIME part. This makes multipart/signed completely unreliable. You can't just go changing the rules for christ-sakes! When you extend a protocol that has been in use, you MUST be compatable with transfer agents that have already been implemented. This was NOT done in the multipart/signed specification. At all!

8 Jun 2002 (updated 8 Jun 2002 at 06:06 UTC) »

Valgrind

tried playing with valgrind to debug evolution-mail at home here, on my celeron 400 w/ 256 mb ram and it was just a might bit slower than molasis flowing uphill in january while being debugged under gdb on a low-end solaris machine remotely over ssh with X forwarding (well, it would be if molasis was a unix X application).

just so you don't get your panties in a twist, I don't put the full blame of this on valgrind (which is why I mention I'm running a celeron - not to mention evolution-mail is a heavy weight champion on steroids).

valgrind has certainly come a long way since I last looked at it a few months ago - in fact, last time I looked at it, it wasn't able to handle any multithreaded application at all, and now it can even handle evolution-mail (albeit a bit slowly on my hardware).

overall, I'm still impressed to say the least. sure beats the pants off my libeaks LD_PRELOAD hack.

POSIX Threads

Let me just say... they screwed the pooch on this one.

I'm extremely disappointed that pthread_once() doesn't block until the callback has been completed. Rather, the first thread to get to the pthread_once() function gets to call the callback while the other threads return immediately and go on their merry way. The whole point of this function as far as I can tell is to provide a means for applications to initialise some static data for later use by the rest of the library/function/whatever. Unfortunately, the fact that all other threads return immediately makes this totally useless for that operation because initialising data in that pthread_once() callback function is not an atomic operation. So here, let me give you the way this should have been implemented in-my-not-so-humble-but-correct-opinion:

typedef struct {
	pthread_mutex_t mutex;
	unsigned int complete;
} pthread_once_t;

#define PTHREAD_ONCE_INIT { PTHREAD_MUTEX_INITIALIZER, 0 }

int pthread_once (pthread_once_t *once, void (*init_func) (void)) { if (!once->complete) { pthread_mutex_lock (&once->mutex); if (!once->complete) { init_func (); once->complete = 1; } pthread_mutex_unlock (&once->mutex); }

return 0; }


While I'm on the subject, I think I'm gonna write a library called ppthreads (Portable Portable Operating System Interface Threads) or maybe tppthreads (Truly Portable Portable Operating System Interface Threads) since pthreads are notoriously not portable at all. Well, at least not beyond the very basics. I've spent a lot of time stumbling over inconsistancies between Linux and Solaris pthreads (and apparently Mac OS X pthreads aren't so complete either) in the Mono project. Everytime I get time to hack on the SPARC port I find myself porting the code over to Solaris.

Isn't it sad when portable doesn't mean portable?

You know, I have one simple request...and that is to have messages with freakin' laser beams attached to their headers. Now evidently my MIME specification informs that that can't be done. Uh, can you remind me what I pay you people for? Honestly, throw me a bone here. What do we have?

18 Apr 2002 (updated 18 Apr 2002 at 04:50 UTC) »

Update: Hmmm, now that I am home and not running the tests over ssh, I'm getting much better performance readings. For example, the memory parser seems to be averaging about 0.945s and my on-disk parser was consistantly reading 0.713s before an optimization I just did that got it down to 0.612s. For those who care, it was a very simple change:

Old code:

inptr = priv->inptr;
inend = priv->inend;

while (inptr <inend) { start = inptr; while (inptr < inend && *inptr != '\n') inptr++;

...


Note that priv->inptr points to the start of data inside an internal 4k read() buffer and priv->inend points to the end of the buffered data within that internal buffer. By adding an extra byte onto the end that we vow never to use except for the following optimization, we can be sure that we will not overflow the internal buffer with the following changes:

New code:

inptr = priv->inptr;
inend = priv->inend;
*inend = '\n';

while (inptr < inend) { start = inptr; while (*inptr != '\n') inptr++;

...


By setting *inend to '\n', we can eliminate our inptr->inend check in the while conditional. This has the result of decreasing the number of instructions from ~7 down to ~4 - that nearly cuts the time wasted in half!

GMime: Just created a 38657516 byte message to get a feel for how my parser would deal with a message that large.

Here are the results, not too bad if you ask me.

ZenTimer: gmime::parser_construct_message took 1.069 seconds
ZenTimer: gmime::message_to_string took 1.731 seconds

For comparison, here are the results of my memory parser from gmime1:

ZenTimer: gmime::parser_construct_message took 1.075 seconds
ZenTimer: gmime::message_to_string took 0.716 seconds

That got me curious as to how long just reading the stream into ram would take, so I wrote a quick test:
ZenTimer: gmime::stream_write_to_stream took 0.542 seconds

Notes: While it looks like my new parser is a hair faster than the memory parser - remember that you need to consider disk buffering that may have been done by the OS. Subsequent calls to the memory parser seemed to yield an average time of 1.050s while subsequent calls to my newer on-disk parser seemed to average closer to 1.078 or so. Just so I don't leave you hanging on the average time for writing the message to ram, it consistantly took about 0.540s (never going more than about 0.002s in either direction).

The structure of the MIME message, in case anyone is interested, was:

Content-Type: multipart/mixed
   Content-Type: multipart/related
      Content-Type: multipart/alternative
         Content-Type: text/plain
         Content-Type: text/html
      Content-Type: image/gif
      Content-Type: image/png
      Content-Type: image/png
   Content-Type: application/x-gzip
   Content-Type: image/png
   Content-Type: image/png
   Content-Type: text/x-c
   Content-Type: application/pdf
   Content-Type: application/pdf
   Content-Type: application/pdf
   Content-Type: application/pdf
   Content-Type: application/pdf
   Content-Type: application/pdf
   Content-Type: application/pdf
   Content-Type: application/pdf
   Content-Type: text/x-c
   Content-Type: image/png
   Content-Type: image/png
   Content-Type: application/pdf
   Content-Type: image/jpeg
   Content-Type: image/png
   Content-Type: image/jpeg
   Content-Type: image/x-portable-anymap
15 Apr 2002 (updated 18 Apr 2002 at 00:30 UTC) »

GMime: Never underestimate the power of Mirwais - Disco Science.ogg. Started implementing that parser I talked about in my last diary entry at around 11pm last night, around 5am this morning the gnome community sees this:

<fejj_sleep> whooo!! my new gmime parser is fucking
fast!!!
<andersca> cool
<fejj_sleep> unf
<andersca> you are very cool fejj
<fejj_sleep> haha, thanks
<fejj_sleep> it's almost as fast as a previous parser
of mine which loaded the entire message into ram before
processing
<fejj_sleep> we're talking a hare difference in speed
<fejj_sleep> and my new parser parses off disk
<andersca> cool
* andersca certifies fejj as master
<fejj_sleep> :)
<jamesh> if this is what fejj can do while sleeping,
think of what he is like when awake

How fast is fast do you ask? Let me show you a comparison of my latest 3 parsers taking as input a ~1.1MB MIME message (a multipart containing a multipart of X screen dumps):

gmime1:
memory parser: 0.051s
on-disk parser: 0.173s (buffered)
on-disk parser: 5.060s (unbuffered)

gmime2:
on-disk parser: 0.057s (varies between 0.056s and 0.057s)

That, my friend, is what we say in hacker land is fast :-)

Update: Just got pointed to this article by a friend: http://www.nodewarrior.org/minorfish/mf-users-archive/archive-20021/0006.html

Gotta love hearing that stuff about your software, makes ya all warm n fuzzy ;-)

74 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!