30 Jan 2010 shlomif   » (Master)

Project Euler Problem #10 in Haskell, Perl and C

zerothorder told how he found a solution for Project Euler's Problem 10 in Haskell. The problem is "Find the sum of all primes less than 2,000,000". It is given here below:

primes :: [Integer]
primes = 2 : filter isPrime [3, 5 ..]
    where
        -- only check divisibility of the numbers less than the square root of n
        isPrime n = all (not . divides n) $ takeWhile (\p -> p*p <= n) primes
        divides n p = n `mod` p == 0
 
result = sum $ takeWhile (< 2000000) primes
 
main = do putStrLn( show result )

He says "If this doesn't give you a nerdgasm, I don't know what will.". The problem is that this nerdgasm will last a long time. Benchmarking this program gives that 20 iterations of it run at 310 seconds - less than - about 15 seconds each (on my Pentium 4 2.4GHz machine running Mandriva Linux Cooker). So next I tried a better Haskell implmenetation that I recalled from a thread I started in the Haskell Café mailing list about implementing a sieve of Eratosthenes in Haskell:

import Data.Int

primes :: Int64 -> [Int64]

primes how_much = sieve [2..how_much] where
         sieve (p:x) = 
             p : (if p <= mybound
                 then sieve (remove (p*p) x)
                 else x) where
             remove what (a:as) | what > how_much = (a:as)
                                | a < what = a:(remove what as)
                                | a == what = (remove (what+step) as)
                                | a > what = a:(remove (what+step) as)
             remove what [] = []
             step = (if (p == 2) then p else (2*p)) 
         sieve [] = []
         mybound = ceiling(sqrt(fromIntegral how_much))

--main = print (length (primes 1000000))
main = print (sum (primes 2000000))

This does not involve costly operations such as modulo or division and 20 iterations of it run at 135 wallclocks seconds - over two times faster than zeroth's Haskell version.

Now how about Perl? Since Perl has assignment, we have the advantage that we can create a vector of bits that we will mark with the primes by iterating over all numbers up to the root of the limit. Here is the code:

#!/usr/bin/perl

use strict;
use warnings;

use Math::BigInt lib => 'GMP';

my $limit = 2_000_000;

my $primes_bitmask = "";

my $loop_to = int(sqrt($limit));
my $sum = 0;
my $total_sum = Math::BigInt->new('0');

for my $p (2 .. $loop_to)
{
    if (vec($primes_bitmask, $p, 1) == 0)
    {
        $sum += $p;

        my $i = $p * $p;

        while ($i < $limit)
        {
            vec($primes_bitmask, $i, 1) = 1;
        }
        continue
        {
            $i += $p;
        }

    }
}

for my $p ($loop_to .. $limit)
{
    if (vec($primes_bitmask, $p, 1) == 0)
    {
        if (($sum += $p) > (1 << 30))
        {
            $total_sum += $sum;
            $sum = 0;
        }
    }
}

$total_sum += $sum;
print "$total_sum\n";

20 runs of it run at 95 walclock seconds - even faster than the Haskell version. But it gets better. Since all the primes we encounter greater than 2 are not even, we can create a map of their pseduo-halves and conserve on memory and iterations. This is the Perl version:

#!/usr/bin/perl

use strict;
use warnings;

use Math::BigInt lib => 'GMP';

my $limit = 2_000_000;

my $primes_bitmask = "";

my $loop_to = (int(sqrt($limit)))>>1;
my $half_limit = ($limit-1)>>1;

my $sum = 0+2;
my $total_sum = Math::BigInt->new('0');

for my $half (1 .. $loop_to)
{
    if (vec($primes_bitmask, $half, 1) == 0)
    {
        my $p = (($half<<1)+1);
        $sum += $p;

        my $i = ($p * $p)>>1;

        while ($i < $limit)
        {
            vec($primes_bitmask, $i, 1) = 1;
        }
        continue
        {
            $i += $p;
        }

    }
}


for my $half ($loop_to .. $half_limit)
{
    if (vec($primes_bitmask, $half, 1) == 0)
    {
        if (($sum += (($half<<1)+1)) > (1 << 30))
        {
            $total_sum += $sum;
            $sum = 0;
        }
    }
}

$total_sum += $sum;
print "$total_sum\n";

Running this 20 times takes 73 wallclock seconds, close to half that of my Haskell version.

Then I wondered how long C will take. Here is a C implementation without the halving:

#include <string.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>

#define limit 2000000
int8_t bitmask[(limit+1)/8];

int main(int argc, char * argv[])
{
    int p, i;
    int mark_limit;
    long long sum = 0;

    memset(bitmask, '\0', sizeof(bitmask));
    mark_limit = (int)sqrt(limit);
    
    for (p=2 ; p <= mark_limit ; p++)
    {
        if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
        {
            /* It is a prime. */
            sum += p;
            for (i=p*p;i<=limit;i+=p)
            {
                bitmask[i>>3] |= (1 << (i&(8-1)));
            }
        }
    }
    for (; p <= limit; p++)
    {
        if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
        {
            sum += p;
        }
    }

    printf("%lli\n", sum);

    return 0;
}

This was too fast to measure with 20 runs alone, so 500 runs of it took 15 seconds, two or three orders of magnitude faster than the fastest Haskell or Perl versions. But naturally, we can apply the halving paradigm there too:

#include <string.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>

#define limit 2000000
int8_t bitmask[(limit+1)/8/2];

int main(int argc, char * argv[])
{
    int half, p, i;
    int half_limit;
    int loop_to;
    long long sum = 0 + 2;

    memset(bitmask, '\0', sizeof(bitmask));

    loop_to=(((int)(sqrt(limit)))>>1);
    half_limit = (limit-1)>>1;
    
    for (half=1 ; half <= loop_to ; half++)
    {
        if (! ( bitmask[half>>3]&(1 << (half&(8-1))) ) )
        {
            /* It is a prime. */
            p = (half << 1)+1;
            sum += p;
            for (i = ((p*p)>>1) ; i < half_limit ; i+=p )
            {
                bitmask[i>>3] |= (1 << (i&(8-1)));
            }
        }
    }

    for( ; half < half_limit ; half++)
    {
        if (! ( bitmask[half>>3]&(1 << (half&(8-1))) ) )
        {
            sum += (half<<1)+1;
        }
    }

    printf("%lli\n", sum);

    return 0;
}

500 runs of it take 10 wallclock seconds - 54.35 times per second, and 50% better than the previous C version. And I still haven't applied platform-specific gcc optimisations.

I should also note that the executables generated by ghc are extremely large in comparison to their C ones:

$ ls -l c_* haskell_*
-rwxr-xr-x 1 shlomi shlomi   6082 2009-12-04 07:46 c_mine
-rwxr-xr-x 1 shlomi shlomi   6103 2009-12-04 07:46 c_mine_half
-rwxr-xr-x 1 shlomi shlomi   6092 2009-12-04 07:46 c_mine_micro_opt
-rwxr-xr-x 1 shlomi shlomi 796825 2009-12-04 07:56 haskell_mine
-rwxr-xr-x 1 shlomi shlomi 571717 2009-12-04 07:46 haskell_zeroth

c_mine_half is less than 1% the size of haskell_mine (and runs faster). When talking about this to other people, they said that Haskell has a very optimised primes sequence generator, which I can try using (which should be over 3 times as fast), and that it has two kinds of integers, which the other type is faster, and that it has a better way to emulate assignment. But the bottom line is that the naïve and intuitive way to write such programs in Haskell is under-performant, even in comparison to Perl, and 100 or 1,000 times as much in comparison to C.

I've written this as a separate post and not as a comment to the original blog post because I'm very limited with the markup in the commenting there (I'm going to post a comment there with a link to this blog post, though). I should note that you can find all the code I mentioned inside a dedicated Github repository, and you can experiment with it further.

Syndicated 2010-01-30 12:50:07 from shlomif

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!