12 Jan 2003 Fefe   » (Master)

SIMD hacking

Wow, what a great weekend so far! ;-)

I started looking at MMX, SSE and 3dnow, and I started hacking at ffmpeg. I actually found a few routines that were reasonable often called, small enough and candidates for SIMD, and wrote a translation. I learned a lot in the process, and now the patches have been integrated into ffmpeg. I will try to submit enough patches to be mentioned in the ChangeLog or in some comment, so I have something to show.

In the process, I found a great profiling package called hrprof. It basically uses a new gcc instrument-this-code option and the Pentium cycle counter to write profiling data usable by gprof but much more accurate. Great tool!

Using the profiler, I found that ffmpeg is spending much time in quant_psnr8x8_c, dct_sad8x8_c and dct_sad16x16_c (and a whole lot of already mmx/mmx2 optimized functions). Those will be my next targets, let's see how far I get. It's quite an experience using those SIMD instructions, so very different from normal SISD hacking. I find it quite rewarding so far.


The shipment with the Infineon DIMM arrived yesterday, and I put together my new EPIA-M box. I put together some PXE net-boot configuration and it now boots diskless. It's on mobile mode (several free-hanging parts connected with strange cables) and is virtually noise-less, although the CPU does have a fan.

Most of the hardware is supported. The network card, the USB, USB2 and Firewire controllers, the sound card... but not the graphics card. I can use mplayer and XFree86 in VESA mode, but that sucks very much, in particular since I want to use it as web clicking PC for my wife, and scrolling a large Mozilla window without hardware blitting is not what I had in mind. The graphics chipset is not even in the lspci database, and I found no data sheet or description of it. It is apparently called CastleRock and is some sort of S3 chip, but what do I know? This is very disappointing, in particular since this hardware was advertised as having "full Linux support". Well, it's not so full after all. If anyone here can help me, please contact me! I already looked at google and tried the current XFree86 snapshot.

Other than that, this is some great hardware! It is fast enough to play full-screen hq-divx movies with AC3 soundtrack at (according to vmstat) 25-40% CPU load. I got the faster one with the 933 MHz CPU, but I guess the slower one would have been sufficient.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!