16 Oct 2007 salimma   » (Apprentice)

Wide Finder: C++ update

Talked with a colleague about the slow single-threaded performance of my Wide Finder implementation, and we narrowed it down to two possibilities:

  • Boost regular expression is not compiled?
  • C++ strings have higher overhead than null-terminated c_str

First point can be ruled out: Boost compiles regular expressions when you assign them. Second point — well, reading in the file using std::getline turns out to consume the bulk of time.

I’ve reorganized the code a bit, using a multimap rather than a vector to rank the URLs by count, with no effect on speed. With two and four threads on a dual-core Intel notebook, the performance is at least on par with Ruby.

Alastair Rankine has a C++ implementation that is slightly faster, but uses Boost memory-mapped IO that I avoided for the same reason he put as caveat: that it will not scale to files that are too large. Which Tim’s log file might well be. Again, that is not significantly faster than the Ruby code.

Moral of the question: Perl and Ruby can be faster than C++! The C implementations out there are blindingly fast, but the way they do regular expression handling are really painful.

Will turn my (limited) spare time to doing a clean JoCaml implementation — it might not be faster but it definitely will look cleaner!

Syndicated 2007-10-16 21:15:21 from Intuitionistically Uncertain ยป Technology

Latest blog entries     Older blog entries

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!