**Sysprof 1.1.8**

A new version 1.1.8 of Sysprof is out.

This is a release candidate for 1.2.0 and contains mainly bug fixes.

Syndicated 2011-07-15 17:48:39 (Updated 2011-09-04 01:42:00) from Søren Sandmann Pedersen

**Sysprof 1.1.8**

A new version 1.1.8 of Sysprof is out.

This is a release candidate for 1.2.0 and contains mainly bug fixes.

Syndicated 2011-07-15 17:48:39 (Updated 2011-09-04 01:42:00) from Søren Sandmann Pedersen

**Fast Multiplication of Normalized 16 bit Numbers with SSE2**

If you are compositing pixels with 16 bits per component, you often need this computation:

uint16_t a, b, r; r = (a * b + 0x7fff) / 65535;

There is a well-known way to do this quickly without a division:

uint32_t t; t = a * b + 0x8000; r = (t + (t >> 16)) >> 16;

Since we are compositing pixels we want to do this with SSE2 instructions, but because the code above uses 32 bit arithmetic, we can only do four operations at a time, even though SSE registers have room for eight 16 bit values. Here is a direct translation into SSE2:

a = punpcklwd (a, 0); b = punpcklwd (b, 0); a = pmulld (a, b); a = paddd (a, 0x8000); b = psrld (a, 16); a = paddd (a, b); a = psrld (a, 16); a = packusdw (a, 0);

But there is another way that better matches SSE2:

uint16_t lo, hi, t, r; hi = (a * b) >> 16; lo = (a * b) & 0xffff; t = lo >> 15; hi += t; t = hi ^ 0x7fff; if ((int16_t)lo > (int16_t)t) lo = 0xffff; else lo = 0x0000; r = hi - lo;

This version is better because it avoids the unpacking to 32 bits. Here is the translation into SSE2:

t = pmulhuw (a, b); a = pmullw (a, b); b = psrlw (a, 15); t = paddw (t, b); b = pxor (t, 0x7fff); a = pcmpgtw (a, b); a = psubw (t, a);

This is not only shorter, it also makes use of the full width of the SSE registers, computing eight results at a time.

Unfortunately SSE2 doesn’t have 8 bit variants of pmulhuw, pmullw, and psrlw, so we can’t use this trick for the more common case where pixels have 8 bits per component.

Exercise: Why does the second version work?

Syndicated 2011-07-03 12:11:20 (Updated 2011-09-04 01:42:00) from Søren Sandmann Pedersen

**New HTML Parser**: The long-awaited libxml2 based HTML parser
code is live. It needs further work but already handles most
markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!