Is this really faster?
This:
unsigned int clipdigit(unsigned int * const v)
{
unsigned int digit = (*v) % 10;
(*v) /= 10;
return digit;
}
is turned into this:
.globl clipdigit .type clipdigit, @function clipdigit: .LFB11: .cfi_startproc movl (%rdi), %ecx movl $-858993459, %edx movl %ecx, %eax mull %edx shrl $3, %edx leal 0(,%rdx,8), %eax movl %edx, (%rdi) leal (%rax,%rdx,2), %edx movl %ecx, %eax subl %edx, %eax ret .cfi_endproc .LFE11: .size clipdigit, .-clipdigit
As a small hint/bit of explanation, 232 - 858993459 = 3435973837 = 235 / 10 + 2.
Is mull really that much faster than divl on x86_64 machines?
I was expecting to get code more like this rather straightforward bit:
.globl clipdigit
.type clipdigit, @function
clipdigit:
.LFB11:
.cfi_startproc
movl (%rdi), %eax
movl $10, %ecx
xorl %edx, %edx
divl %ecx
movl %eax, (%rdi)
movl %edx, %eax
ret
.cfi_endproc
.LFE11:
.size clipdigit, .-clipdigit
It turns out in testing that the second clip of code is much, much slower than the first clip. The strange mull method is about 5 times faster than the straightforward divl method. Wow, divl seems really broken if it's that slow.
Syndicated 2009-08-30 20:38:53 (Updated 2009-08-30 22:39:02) from Lover of Ideas
