Since it's been strongly hinted by my elsewhere-readers, I shall give a quick description on the latest for-work CL code I've written (no code yet, getting things released have been punted to the managing directors). First, this is code written for work, at work, using work's equipment so all I can do is talk about it, for now. Second, the reason this code exists at all is that it was Needed.
So, we extract network data using netflow. This, for a decent-sized network, means lots of data (we're talking GB per hour lots, about 21 MB/s as things stand, but there will be more). We had, mid-last-week, three APIs to extract this. One in C, one in Perl, one in Python. Doing any sort of "massive data munging" in plain C is, at best, somewhat painful ()lack of decent built-in data structures being the main pain), so I had been hoping (indeed chose the specific netflow daemon we use, based on) the existing Python API was going to do the trick.
It manifestly did not. On the box where the data resides, it took on the order of 8 minutes just doing the most basic processing of 2 minutes worth of logs (essentially "extract one record" in a loop). The Perl API was somewhat better, on the same amount of data, it takes "only" (sorry, danb) 6 minutes, 49 seconds. The C API does that slurping in 14-15 sec, so is "fast enough".
My initial CL "slurp logs, present each entry as an object" took a whole whopping 33s, but a slight refinement first thing yesterday had this down to 17s. Today, I have been generating some information from the amassed data and while I find 14 minutes OK for processing 1h worth of data, I will see what I can do to speed things up. Possibly by adding a "re-use existing object" frob to the API. Most (but not all) of the time, I only need a given log entry for the fraction of a second it takes to extract what I want from it (from some other API design decisions, I need one per netblock to hang around).
I am hoping to be able to either contribute this back to the original writer of the daemon or at least put a tarball with the two most important files (the package definition and the actual code) somewhere.