28 Apr 2008 rlougher   » (Master)

Third time lucky?

In JamVM 1.5.0 I released the "inlining interpreter" which copies code blocks together in a similar way to a simple JIT (but the code is compiled by gcc, rather than being generated natively as in a JIT). This achieved an impressive speed improvement and I've been keen to optimise it further.

The major thing which has been in my sights is the remaining dispatches between adjacent basic blocks. For instance the fallthrough edge of an "if" and across the edge caused by a jump target.

Currently, the unit of inlining is a basic block because this has only one entry point and one exit point. Blocks which contain instructions which need to be rewritten (quickening, e.g. after symbolic resolution) must be executed first using threaded dispatch. If we inlined across blocks we will end up with blocks with multiple entry and multiple exit points. In this case, depending on control flow, we may complete the block before all the block has been executed, or we may never reach the end of the block because a side exit is always taken. In the first case, inlining can't be done, and in the second inlining will never occur.

My first approach to solve this was simplistic, and was more of an experiment to "test the water". So I wasn't too surprised when it didn't yield any speed improvement.

My second attempt was much more complex (after inlining check the edges and create a longer block if the blocks across the edge are both inlined, and fix up internal targets). But this showed no significant general speed improvement (order of 2%) although specific microbenchmarks showed over 100%.

So for the last week I've been trying to explain the results. After some experiments I've come to the conclusion that it is due to instruction cache locality. Basically, the merged block (which may be merged many times) ends up in a different location to the non-mergeable blocks which remain in the initial position. Previously, inlining exhibited good cache locality due to blocks being allocated in execution order. This was destroyed by block merging. The effects of this counteracted the speed improvements leading to no change overall.

This was the position I was in on Friday (which added to my despondency). However, on the weekend I rethought the problem and came up with a third approach. I've partially implemented it and hopefully should be able to test it in a few days. Fingers crossed!

Syndicated 2008-04-28 15:07:00 (Updated 2008-04-28 17:01:27) from Robert Lougher

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!