Is this really faster?
This:
unsigned int clipdigit(unsigned int * const v) { unsigned int digit = (*v) % 10; (*v) /= 10; return digit; }is turned into this:
.globl clipdigit .type clipdigit, @function clipdigit: .LFB11: .cfi_startproc movl (%rdi), %ecx movl $-858993459, %edx movl %ecx, %eax mull %edx shrl $3, %edx leal 0(,%rdx,8), %eax movl %edx, (%rdi) leal (%rax,%rdx,2), %edx movl %ecx, %eax subl %edx, %eax ret .cfi_endproc .LFE11: .size clipdigit, .-clipdigit
As a small hint/bit of explanation, 232 - 858993459 = 3435973837 = 235 / 10 + 2.
Is mull really that much faster than divl on x86_64 machines?
I was expecting to get code more like this rather straightforward bit:
.globl clipdigit .type clipdigit, @function clipdigit: .LFB11: .cfi_startproc movl (%rdi), %eax movl $10, %ecx xorl %edx, %edx divl %ecx movl %eax, (%rdi) movl %edx, %eax ret .cfi_endproc .LFE11: .size clipdigit, .-clipdigit
It turns out in testing that the second clip of code is much, much slower than the first clip. The strange mull method is about 5 times faster than the straightforward divl method. Wow, divl seems really broken if it's that slow.
Syndicated 2009-08-30 20:38:53 (Updated 2009-08-30 22:39:02) from Lover of Ideas