All day today was spent working on UncommonSQL support for Oracle. We have been developing a very sophisticated application on top of USQL, using PostgreSQL as the backend. The production environment is supposed to be in Oracle, so we have started deploying test servers and having our developers run with an Oracle backend. Imagine our surprise when we found the application slowed down to an almost useless pace.
It seems that the overhead per request with our Oracle backend was perposterously high. With the Oracle Call Interface (OCI) any dynamic queries require that you ask the statement handle about what columns you are getting back in your rows, and provide it with buffers to fetch the results into. A naive implementation of this interface using CMUCLs ALIEN package, a marvelous Lisp->DamnNearAnything FFI, was mallocing these buffers at runtime for each and every request. The result was adding a second to the round trip for some of our queries, regardless of how little or much data actually was returned.
Luckily, Common Lisp profiling is easy as pie and I was able to zero in on the problematic function in a few minutes. Figuring out a way of optimizing that time away took a bit longer, as I was loathe to spend my Sunday hacking low-level OCI interface code. So I decided the easiest thing to do would be to provide an alternative system for allocating memeory in the ALIEN package, one that would reuse malloced memory as opposed to re-mallocing and freeing it very every single request. 30 minutes later I had a trimmed 1/5th the time off the function.
Next I rewrote the innermost loop which iterated over the columns in the statement handler. I took out some use of multiple-values (in lisp a function can return more than one value and that doesn't mean a list or array of values, but literally more than one value), factored three different logic blocks together and got rid of some spurious variables. Doing this and then telling the compiler what types some other variables were shaved more time off, and the final touch of simply telling the compiler to really optimize this function resulting in me finally reducing the time in that function by a little over one half.
Most importantly, average query times dropped enough to make he app useable. The painful thing is that this has nothing to do with Lisp, as the Postgresql interface is lightening quick, it's just that Oracle has a much more complicated way of representing data when it comes back to the client thru the OCI. It wants the app to control all these aspects of it's operation, including the allocation of storage for the results of a select. PostgreSQL is much simpler, providing a row level interface to the returned data.
So PostgreSQLs simple client interface looks sweet, and for our datasize it's fine, but the OCI snafu today made me realize how much better Oracle's handling of results would scale to gigantic datasets. I can control memeory use at a much more precise level in Oracle, and also can control when and how much data comes across the wire when reading results. The price you pay is the overhead of setting up the statement handler and it's result buffers. I think it's a small price to pay in the long run. Ideally tho, the OCI should come with an intermediate interface that would handle allocation for you.
I also learned alot of lisp profiling today. I have been doing lisp coding for awhile, but I seldom have really concentrated on a small chunk of code in an attempt to optimize it. Part of this is due to the fact that I've seldom had problems with CMUCLs performance. The one thing that sticks out in my mind is that I have only begun to understand the power of CMUCL's compiler, called Python (no relation to the other language). I have so much to learn.