30 Aug 2009 Omnifarious   » (Journeyer)

Is this really faster?

This:

unsigned int clipdigit(unsigned int * const v)
{
   unsigned int digit = (*v) % 10;
   (*v) /= 10;
   return digit;
}
is turned into this:
.globl clipdigit
	.type	clipdigit, @function
clipdigit:
.LFB11:
	.cfi_startproc
	movl	(%rdi), %ecx
	movl	$-858993459, %edx
	movl	%ecx, %eax
	mull	%edx
	shrl	$3, %edx
	leal	0(,%rdx,8), %eax
	movl	%edx, (%rdi)
	leal	(%rax,%rdx,2), %edx
	movl	%ecx, %eax
	subl	%edx, %eax
	ret
	.cfi_endproc
.LFE11:
	.size	clipdigit, .-clipdigit

As a small hint/bit of explanation, 232 - 858993459 = 3435973837 = 235 / 10 + 2.

Is mull really that much faster than divl on x86_64 machines?

I was expecting to get code more like this rather straightforward bit:

.globl clipdigit
	.type	clipdigit, @function
clipdigit:
.LFB11:
	.cfi_startproc
	movl	(%rdi), %eax
        movl    $10, %ecx
        xorl    %edx, %edx
        divl    %ecx
        movl    %eax, (%rdi)
        movl    %edx, %eax
	ret
	.cfi_endproc
.LFE11:
	.size	clipdigit, .-clipdigit


It turns out in testing that the second clip of code is much, much slower than the first clip. The strange mull method is about 5 times faster than the straightforward divl method. Wow, divl seems really broken if it's that slow.

Syndicated 2009-08-30 20:38:53 (Updated 2009-08-30 22:39:02) from Lover of Ideas

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!