29 Sep 2010 Ringding   » (Master)

Motivated testers are a godsend

Ken Gilmer has found some
examples of surprisingly bad CACAO performance
on ARM
(see slides).
In my understanding, CACAO should be at least as fast as
JamVM for moderately long-running benchmarks such as
probe and decode. Before looking into the
issue, I could only imagine lots of JIT/native transitions
as the cause of this, but I was not entirely convinced that
that alone would create such a performance problem.

Digging into it using oprofile, I quickly found
some very inefficient code for handling JNI local
references. Interestingly, it’s mostly a memset
of 64 bytes that is run on every transition from JIT code to
native code. It seems that either memory bandwidth on ARM is
unbelievably low or the machine’s caching behavior is
extremely poor, as the same call, when run on decent x86
hardware, doesn’t even show up in a profile, at least not
nearly as prominently.

Anyway, after an overhaul
of this half-decade old code
, performance for these two
benchmarks has improved by more than 50%. JamVM is still
faster for decode, but only slightly. I attribute
this to the garbage collector.

Without Ken’s talk, I would have never found out about this.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!