Name: Shlomi Fish
Member since: 2001-08-15 16:24:36
Last Login: 2010-02-02 18:16:52
Homepage: http://www.shlomifish.org/
Notes:
I am an open-source user and developer, but don't really have anything against commercial software, if written and maintained well. I am an active member of the Israeli Group of Linux Users, the Haifa Linux Club, the Tel-Aviv Linux Club and numerous open-source related mailing lists.
My favourite programming language is Perl, but I also like Haskell, Matlab and Bash where appropriate. And I have a love/hate relationship with C/C++.
So far, my most ambitious open-source project has been Freecell Solver, but I also wrote or contributed to some other projects and hacks.
I received a B.Sc. from Electrical Engineering from the Technion, and am now looking for a good job in IT.
Complete list of articles by shlomif
NYTProf-3 is Out!
Tim Bunce writes on his blog about the new features in Devel-NYTProf version 3. Devel-NYTProf is a profiler for the Perl programming language, which has put all the previous attempts in profiling in the dust, and now it's even better than before. Enjoy! (Thanks to Fred Moyer's post on the San-Fransisco Perl Mongers mailing list).
New Page about Text Editors and IDEs for Programmers
After answering the questions "Can anyone recommend a good text editor?" and "What is a good IDE?" a few times in the past, I set up a page listing some prominent text editors and Integrated Development Environments (IDEs) for development on my homepage. Currently it includes only cross-platform open-source text editors and cross-platform open-source IDEs, but I'm planning to expand it as time and interest permits. Additions would be welcome.
Vim is there and so are XEmacs and Eclipse, and also Padre, the Perl IDE and more specialised IDEs such as the "Eric" Python IDE.
פגישת קוד פתוח בתל-אביב: 14 בפברואר - vtiger CRM
מועדון הקוד הפתוח התל-אביבי (תלוקס) ייפגש שוב כדי לשמוע את הרצאתו של רמי הדדי אודות "מערכת ניהול קשרי הלקוחות vtigerCRM. ההרצאה תתקיים ביום ראשון, 14 בפברואר 2010, בשעה 18:00, באולם הולצבלט, מס' 007 במסדרון הבניינים למדעים מדויקים (שימו לב לשינוי בשעה ובמיקום משנה שעברה) באוניברסיטת תל אביב. פרטים נוספים, מפות להגעה וכיוצא בזה, ניתן למצוא באתר ובוויקי. הנוכחות בהרצאה היא חינמית ולא נדרשת הרשמה מראש.
vtiger CRM היא מערכת ניהול קשרי לקוחות אשר פועלת בממשק Web דרך הדפדפן. המערכת מכילה את כל הכלים הנדרשים לניהול מלא ומקיף של עסק: החל מעסק קטן של אדם אחד וכלה בעשרות עובדים. במערכת ניתן למצוא ביטוי ללקוחות, יומנים, הזדמנויות עסקיות, מכירות, שירות ועוד. המערכת מבוססת על קוד-פתוח וניתן גם לבצע בה התאמות בקלות רבה. המערכת תורגמה לשפות רבות ובכללן עברית.
רמי הדדי הינו מנכ"ל אקטיבטק אשר מתמחה ביעוץ, פיתוח ואירוח של vtiger CRM.
אנו תמיד מחפשים מרצים שיתנדבו לתת הרצאות בנושאים שונים הקשורים לקוד-פתוח ולמחשבים. במידה שאתם מעוניינים לתת הרצאה, או שיש לכם הצעה להרצאה שמעניינת אתכם, נשמח לשמוע ממכם.
Escape from GNU Autohell!
I've set up a page on my web-site titled "Escape from GNU Autohell!" about the evils of the GNU Autotools (GNU Autoconf/Automake/Libtool and other GNU auto-brain-damages) and why you should convert to a good alternative such as CMake. The page converts a list of the many disadvantages of the GNU Autotools, some refutation of common arguments against CMake, and some links.
And lo and behold, some time after writing the page I spent hours dealing with yet another Autohell problem: "Mandriva's libtool causes "make install" of xine-lib-1.1-hg to fail". I actually had to build the Mandriva libtool several times to find the offending downstream patch, and libtool's "./bootstrap" script was so slow that I could finish running "make" and "make install" in xine-lib while waiting for it to finish. And it says a lot because xine uses GNU Autohell which slows down the build process considerably. Eventually, I found the offending patch after several hours, and I wouldn't have finished on time if I didn't cancel the "make check" test in the .rpm.
Autohell must die!
Project Euler Problem #10 in Haskell, Perl and C
zerothorder told how he found a solution for Project Euler's Problem 10 in Haskell. The problem is "Find the sum of all primes less than 2,000,000". It is given here below:
primes :: [Integer]
primes = 2 : filter isPrime [3, 5 ..]
where
-- only check divisibility of the numbers less than the square root of n
isPrime n = all (not . divides n) $ takeWhile (\p -> p*p <= n) primes
divides n p = n `mod` p == 0
result = sum $ takeWhile (< 2000000) primes
main = do putStrLn( show result )
He says "If this doesn't give you a nerdgasm, I don't know what will.". The problem is that this nerdgasm will last a long time. Benchmarking this program gives that 20 iterations of it run at 310 seconds - less than - about 15 seconds each (on my Pentium 4 2.4GHz machine running Mandriva Linux Cooker). So next I tried a better Haskell implmenetation that I recalled from a thread I started in the Haskell Café mailing list about implementing a sieve of Eratosthenes in Haskell:
import Data.Int
primes :: Int64 -> [Int64]
primes how_much = sieve [2..how_much] where
sieve (p:x) =
p : (if p <= mybound
then sieve (remove (p*p) x)
else x) where
remove what (a:as) | what > how_much = (a:as)
| a < what = a:(remove what as)
| a == what = (remove (what+step) as)
| a > what = a:(remove (what+step) as)
remove what [] = []
step = (if (p == 2) then p else (2*p))
sieve [] = []
mybound = ceiling(sqrt(fromIntegral how_much))
--main = print (length (primes 1000000))
main = print (sum (primes 2000000))
This does not involve costly operations such as modulo or division and 20 iterations of it run at 135 wallclocks seconds - over two times faster than zeroth's Haskell version.
Now how about Perl? Since Perl has assignment, we have the advantage that we can create a vector of bits that we will mark with the primes by iterating over all numbers up to the root of the limit. Here is the code:
#!/usr/bin/perl
use strict;
use warnings;
use Math::BigInt lib => 'GMP';
my $limit = 2_000_000;
my $primes_bitmask = "";
my $loop_to = int(sqrt($limit));
my $sum = 0;
my $total_sum = Math::BigInt->new('0');
for my $p (2 .. $loop_to)
{
if (vec($primes_bitmask, $p, 1) == 0)
{
$sum += $p;
my $i = $p * $p;
while ($i < $limit)
{
vec($primes_bitmask, $i, 1) = 1;
}
continue
{
$i += $p;
}
}
}
for my $p ($loop_to .. $limit)
{
if (vec($primes_bitmask, $p, 1) == 0)
{
if (($sum += $p) > (1 << 30))
{
$total_sum += $sum;
$sum = 0;
}
}
}
$total_sum += $sum;
print "$total_sum\n";
20 runs of it run at 95 walclock seconds - even faster than the Haskell version. But it gets better. Since all the primes we encounter greater than 2 are not even, we can create a map of their pseduo-halves and conserve on memory and iterations. This is the Perl version:
#!/usr/bin/perl
use strict;
use warnings;
use Math::BigInt lib => 'GMP';
my $limit = 2_000_000;
my $primes_bitmask = "";
my $loop_to = (int(sqrt($limit)))>>1;
my $half_limit = ($limit-1)>>1;
my $sum = 0+2;
my $total_sum = Math::BigInt->new('0');
for my $half (1 .. $loop_to)
{
if (vec($primes_bitmask, $half, 1) == 0)
{
my $p = (($half<<1)+1);
$sum += $p;
my $i = ($p * $p)>>1;
while ($i < $limit)
{
vec($primes_bitmask, $i, 1) = 1;
}
continue
{
$i += $p;
}
}
}
for my $half ($loop_to .. $half_limit)
{
if (vec($primes_bitmask, $half, 1) == 0)
{
if (($sum += (($half<<1)+1)) > (1 << 30))
{
$total_sum += $sum;
$sum = 0;
}
}
}
$total_sum += $sum;
print "$total_sum\n";
Running this 20 times takes 73 wallclock seconds, close to half that of my Haskell version.
Then I wondered how long C will take. Here is a C implementation without the halving:
#include <string.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#define limit 2000000
int8_t bitmask[(limit+1)/8];
int main(int argc, char * argv[])
{
int p, i;
int mark_limit;
long long sum = 0;
memset(bitmask, '\0', sizeof(bitmask));
mark_limit = (int)sqrt(limit);
for (p=2 ; p <= mark_limit ; p++)
{
if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
{
/* It is a prime. */
sum += p;
for (i=p*p;i<=limit;i+=p)
{
bitmask[i>>3] |= (1 << (i&(8-1)));
}
}
}
for (; p <= limit; p++)
{
if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
{
sum += p;
}
}
printf("%lli\n", sum);
return 0;
}
This was too fast to measure with 20 runs alone, so 500 runs of it took 15 seconds, two or three orders of magnitude faster than the fastest Haskell or Perl versions. But naturally, we can apply the halving paradigm there too:
#include <string.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#define limit 2000000
int8_t bitmask[(limit+1)/8/2];
int main(int argc, char * argv[])
{
int half, p, i;
int half_limit;
int loop_to;
long long sum = 0 + 2;
memset(bitmask, '\0', sizeof(bitmask));
loop_to=(((int)(sqrt(limit)))>>1);
half_limit = (limit-1)>>1;
for (half=1 ; half <= loop_to ; half++)
{
if (! ( bitmask[half>>3]&(1 << (half&(8-1))) ) )
{
/* It is a prime. */
p = (half << 1)+1;
sum += p;
for (i = ((p*p)>>1) ; i < half_limit ; i+=p )
{
bitmask[i>>3] |= (1 << (i&(8-1)));
}
}
}
for( ; half < half_limit ; half++)
{
if (! ( bitmask[half>>3]&(1 << (half&(8-1))) ) )
{
sum += (half<<1)+1;
}
}
printf("%lli\n", sum);
return 0;
}
500 runs of it take 10 wallclock seconds - 54.35 times per second, and 50% better than the previous C version. And I still haven't applied platform-specific gcc optimisations.
I should also note that the executables generated by ghc are extremely large in comparison to their C ones:
$ ls -l c_* haskell_* -rwxr-xr-x 1 shlomi shlomi 6082 2009-12-04 07:46 c_mine -rwxr-xr-x 1 shlomi shlomi 6103 2009-12-04 07:46 c_mine_half -rwxr-xr-x 1 shlomi shlomi 6092 2009-12-04 07:46 c_mine_micro_opt -rwxr-xr-x 1 shlomi shlomi 796825 2009-12-04 07:56 haskell_mine -rwxr-xr-x 1 shlomi shlomi 571717 2009-12-04 07:46 haskell_zeroth
c_mine_half is less than 1% the size of haskell_mine (and runs faster). When talking about this to other people, they said that Haskell has a very optimised primes sequence generator, which I can try using (which should be over 3 times as fast), and that it has two kinds of integers, which the other type is faster, and that it has a better way to emulate assignment. But the bottom line is that the naïve and intuitive way to write such programs in Haskell is under-performant, even in comparison to Perl, and 100 or 1,000 times as much in comparison to C.
I've written this as a separate post and not as a comment to the original blog post because I'm very limited with the markup in the commenting there (I'm going to post a comment there with a link to this blog post, though). I should note that you can find all the code I mentioned inside a dedicated Github repository, and you can experiment with it further.
shlomif certified others as follows:
Others have certified shlomif as follows:
[ Certification disabled because you're not logged in. ]
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!