I've been playing with cairo and xpdf the last few days, trying to make a cairo-based backend for xpdf. That should allow nice antialiased graphics and eventually full PDF 1.3 transparency support.
However, when I tried to render a page in a PDF file I had lying around it was rendering very slowly. I know the Cairo authors have said that they're focusing on design and correctness for now, and that there are lots of optimizations that are not yet done. However, I was getting rendering speeds of a few characters per second, and the full page took 4 minutes and 30 seconds to render. Clearly something was wrong.
So, I fired up the profiler to get a better understanding of what was going wrong. It appears that I was using a non-rectangular clip region, which in Cairo is implemented by an alpha buffer (called the clip surface) that is used as a mask when drawing to the target. When drawing a glyph cairo was allocating a temporary image for drawing the glyph into. Unfortunately it was allocating a termporary buffer the same size as the clip surface, even though the glyph drawn into it was much smaller. In my case, the clip surface was 1240x1754 pixels!
I fixed this in several places so that it only allocated the smallest temporary buffer needed. Rendering was a lot faster now, but profiling showed another large slowdown. When clearing the clip surface and temporary buffers it was somehow using the super-generic pixel compositing functions! I looked into this and discovered that the libpixman solid rectangle fill functions were disable since they were not implemented yet.
After implementing solid rectange fills I was now down to 2.42 seconds for rendering the PDF page. Thats a speedup of over 10000%, in just a days work! I've sent the patches to the cairo list, and I hope they'll get applied soon.
Sometimes optimization is extremely hard. I've personally spent years trying to make Nautilus faster. Its gotten better, but its very hard work. But for software in their early stages of life, optimization is a fun and very rewarding excercise.
In the end I also looked into why it was using the clip surface so much, and it turns out that the page size (crop box) of the page when transformed to device coordinates didn't end up on integer coordinates. This meant cairo had to use a clip surface for the whole page, just to clip the last half a pixel of the rightmost column. By tweaking the crop box to have integer coordinates I got another 400% speedup. And it even looks better than with the original clipping.