I've been sleeping a lot this week. Part of it is that I was tired after a 40-hour trip door to door home from Japan, but another part is depression after losing one of our two pet cats, and part is being a bit overwhelemed with all the stuff we have to do in the next few weeks, including probably moving house and sub-letting this one. I'm tempted to cheer myself up by buying a computer, but I'm not sure that's very responsible! I did get to look at lots of digital cameras in Japan, although they're way too expensive there.
Spent some time getting a sourceforge page set up for some Web log summary scripts I wrote years ago; some of the documentation is up, but not the code yet, because in writing the install notes I decided it was all to horrible and I want to improve it first!
*
robocoder, I'm aware of two main dangers of relying on captchas (e.g. images of hard-to-OCR numbers used to try to keep spambots out and let people in). The first is that blind people can't use them, and in many cases this can be discriminatory and illegal, so you have to provide an alternate method that's not so difficult as to be discriminatory in itself. The second is that these systems can be easily broken if there is a financial incentive. There have been reports of spammers using a system that relays the captcha questions onto a free porn site registration form, for instance. When someone registers, the corresponding hotmail (or whatever) registration is completed by the software. One way round that is to use text questions that incorporate the name of your Web site in the answer, I suppose.
Ingvar, if it takes a C program 14 seconds to read 43MBytes of data on a reasonably recent computer, either the data format is very very intricate or it expands into using an awful lot of memory when unpacked.
If you're not doing it already, use profiling tools such as gprof(1) and maybe consider using mmap(). If there's no obvious function using more than 10% of the time, maybe consider inlining some frequently-called functions r turning them into macros (depending on which compiler you use). Compiler options can help too.
My hololog program reads a 50MByte or so httpd logfile in Perl in less time than that, including matching multiple regular expressions on each line. On a 250MHz Pentium 1 "Pro" system with 128 MBytes of RAM and slow 7200RPM disks. But it should be a lot faster if I work on it some more some time, I suspect.
Sometimes a good compromise is to write a C program to read the data and extract some of it into a text format (e.g. XML-based), and then weed it further in Python or Perl, or even XSLT or XML Query.