18 Mar 2010 ncm   » (Master)

x86_64 AES assembly stinks, C blazes
In response to a problem at work where the new version of a program was running much slower than the released one, I looked at the performance of openssl doing AES on recent Intel x86_64. We had carefully ensured that SSL built using the hand-optimized x86_64 assembly version of the code. Imagine our surprise at finding that the C version runs twice as fast. I had tears from laughing.

How could this be? I interpreted it to mean that somebody at Intel made sure the compiler for x86 -> internal RISC (implemented in silicon) recognizes the instruction sequence generated by Gcc for AES, and pastes in its place a hand-coded sequence of internal operations. This means, in turn, that any new cipher that doesn't much resemble AES will be unfairly handicapped by running much more slowly on current Intel chips. It might mean that if Gcc were to improve its optimization to the point that the chip doesn't recognize the sequence it generates as AES, performance might drop by half or more. I wonder which chip version first got this optimization.

[Update: Or, it means that L1 cache alignment came out unlucky for the assembly code, in that particular build. Or something.]

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!