30 Dec 2001 apm   » (Journeyer)

It sure feels good when you can make a specific change to some software and get a definite improvement!

Suneido uses a cost based optimizer for its database queries. For each operation (e.g. join, union), it estimates the cost of different strategies. The problem is that if you have a complex query with a lot of operations, then the number of possible combinations of strategies becomes very large - a "combinatorial explosion". So query optimization slows down a lot for large queries.

However, I had a feeling there was a lot of redundant work going on recalculating the same costs over and over in different combinations. If this was the case, then caching the calculated costs for each operation could save a lot of work. So I added a simple cache to each operation class, using a linked list. After calculating a cost for a certain strategy I added it to the cache. When asked for the cost of a strategy I checked in the cache first to see if it had already been calculated. This involved 20 or 30 lines of code in the base class for query operations. All the automated tests still ran - a good sign that I hadn't broken anything.

With some simple counters in the code I found that on the automated tests (mostly fairly simple queries) the cache was eliminating about 2/3 of the cost calculations. Not bad, but the real test would be complex queries. One of the more complex queries in our accounting package was taking about 2 seconds to optimize. With the cache it now took about .1 seconds - 20 times faster!

Not bad for about an hour's work! But the best part was that the structure of the code allowed me to make this change easily and locally without disrupting anything else. Yes, I could've included the caching when I originally wrote the code. (If I'd known at that point it was worthwhile.) But I'm a firm believer in doing simple versions first. If you try to include "everything" in the initial version, you'll never get it done, it'll never work, and you'll still have to change it later.

If you're interested, you can see the actual changes to query.h and query.cpp in CVS on SourceForge

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!