Spent the day working on Render acceleration for Radeons in the Xati driver. Quite a bit of time was spent just figuring out what Render's Composite operations do exactly (not to be confused with the composite extension). I think I've got it figured out, and one block of code should be able to cover the most important cases (the things used by xcompmgr, and what's used for subpixel antialiasing of fonts, along with many others). The question is what are the appropriate hooks to make to the driver -- do we make a collection of hooks for specific things to accelerate, or basically just hand off the PicturePtrs from the arguments to Composite if we manage to push the pixmaps into offscreen memory, and fall back if the driver doesn't handle it? At least for radeon, the second option will result in much less code for many more operations accelerated, at the expense of higher overhead for fallbacks. But I guess fallbacks are slow enough anyway.
Worked on my hook for the "(ARGB8888 IN A8) OVER screen" composite, one of the common ones in xcompmgr. It (as will be the case for most hardware) will be implemented using the 3d hardware and treat the pixmaps as textures. What appears to be required for most operations (xcompmgr's operations included) is non-power-of-two textures, wrapping for POT textures at least (for 1x1 textures), two texture units, and the standard GL_BLEND-type alpha blending. I suspect more ops could be accelerated using more complicated texture blending instead of GL_BLEND, and I would bet the NPOT texture requirement could be avoided by using scissoring. Anyway, most of the 3d setup is done, and I just need to set one more register and then write code to actually emit vertices. I hope. :-)
Note that this doesn't cover trapezoid acceleration, which is something that will be used by many consumers of Render (cairo, for example). It's just to get some of the very common uses at the moment (AA text and xcompmgr) accelerated.