19 Feb 2008 lupus   » (Master)

Novell hack week

One of the things that have been sitting in my TODO list for a few years is to improve the performance of the Regex engine in Mono, both by speeding up the interpreter and by compiling the regular expressions to IL code so that the JIT could optimize it. This seemed like a good project for hack week and Zoltan joined me doing the implementation.

I worked on a new compiler/interpreter combination that uses simplified opcodes, with the aim of making their execution faster and also making the translation to IL code easier. As an example, the old interpreter used a 'Char' opcode to match a single char, but this opcode has several runtime options to ignore the case, negate the assertion and go backward in the string. Each option required decoding and conditional branching at runtime, so removing this overhead should improve performance significantly.

Zoltan made a clever observation: he could reuse the new interpreter for bytecodes that the IL engine he was working on couldn't yet handle and so he used dynamic methods with the same signature as the method in the interpreter that evaluates the regex bytecode. This design has also the nice property that compiled regular expressions will be garbage collected (as opposed to the MS runtime that, at least in the 1.1 version, will leak in this case).

IL-compiling regular expressions has several benefits: it completely removes dispatch and decoding overhead and the JIT will use very fast instructions like compare with immediate to implement the 'Char' bytecode.

I used a few microbenchmarks to test the speed of the new engines, but I'll report here just the results of running the regexdna test from the language shootout: in other cases the speedup is even bigger.

  • Old interpreter: 10.1 seconds
  • New interpreter: 5.4 seconds
  • IL engine: 1.3 seconds.

Most of the new code is in svn, though it's not enabled since it's still incomplete. We'd need another week or so to make it usable instead of the old engine and we haven't allocated the time yet to complete this work, but it sure looks promising.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!