So, dear diary, I found something really cool the other day. Apparently gzip supports appending multiple files to a single file. Eg. The following actually works:
$ gzip < log1 > biglog.gz $ gzip < log2 >> biglog.gz $ gunzip < biglog.gz
The result is just like a "cat log1 log2". In fact, you can even tail a gzipped log:
$ foo | gzip >> biglog.gz $ tail -f biglog.gz | gunzip & $ bar | gzip >> biglog.gz
I haven't had time to look at the gzip code, but it has great implications for the sorts of things I have to work with. For example, I may be able to avoid uncompressing things to work with them, then recompressing. Ie. I'll need a lot less disk. Considering that we have the weblogs for ny.com back to 1994 that's some serious data to cope with. And that's nothing compared to the Terabytes a week at work (Inktomi/Yahoo).
So I'm not sure how I'll use this, but it sure is cool, and surprising that I've never noticed it before. Furthermore, a quick survey of my co-workers and friends revealed that noone has seen this before...
I'll keep my bits to myself now, I promise.
Anyway, it occurred to me that "lack of memory" is closely related to "good programming practice." Imagine if every day you had to work on a program, you forgot the prior day's coding!
Well, I realize it's a tenuous sort of argument. I probably couldn't program with limited memory. Some of the best programmers I know, or have heard of, have tremendously strong memory. James Tanis tells me that Seth Robertson was able to do some of his best kernel work because he could keep so much of it in his head at once. Nevertheless, it's a fascinating metric for judging good programming practice. If there's a question of how to program something, ask yourself what someone without long-term memory would do.
Where was I?
30 Apr 2002 (updated 30 Apr 2002 at 07:02 UTC) »
Eg. /usr/local/bin/spamassassin -P | /usr/lib/nmh/slocal -user $1
My maildelivery looks for the special header added by spamassassin, X-Spam-Flag. The default rules catch about 80% of my spam, which I dump in my spam folder just in case I want to double-check the results. There are some very complex configuration rules one can build that result in each message being scored as potential spam. Users can extend the rules, and apply new scores for existing rules.
It seems to simply use word analysis without much regard for the semantic or syntactic structure of email. Email comes into slocal and is sorted based on ifile's simple database of folders to word occurences. It's smart enough to learn when you refile a piece of mail.
I was hoping it would identify spam for me, but having tried it for a couple of weeks, I'm turning it off. Unfortunately, it's not appropriate for the way I file my email. I can imagine situations where it's useful, but it's got several problems:
One day, I'd like to get around to making some minor adjustments and fixes/updates. It seems like parts of it should be a standard part of the unix toolkit, such as word frequency analysis. Here are some guidelines if you're considering using it:
I've been using MoinMoinWiki for several months now at work. The size of the Wiki, being used actively by about 6 people, has finally reached a point where the organization is a problem. But then I discovered two really great things 1) The CategoryCategory page, and 2) Template feature. Boy, this Wiki is a nice balance of powerful and simple. Really worth it's salt. At first I didn't think much of it, but I'm really starting to appreciate it.
I highly recommend everyone try the MoinMoinWiki which can be found at this SourceForge page. For those unfamiliar with WikiWikiWebs, here's some of the salient points, based on my experience with MoinMoinWiki:
Let me describe the two features of MoinMoinWiki that I mentioned earlier:
For more advanced features there's a perl wiki called TWikiWeb which I haven't looked into yet. It seems to support some sophisticated structured data (ie. fill-out forms).
Disturbing happenings at OSDN (sourceforge and freshmeat) are making me antsy.
My fondness for Jabber is continuing to grow. It's really smart in subtle ways. I think it makes a good general purpose monitor for events happening everywhere. If any shell script could report a message to a jabber channel for me across the whole network, then managing a lot of machines proactively would be much easier. Sort of personalized syslog... Problems are issues around unicode and internationalization, and embedding XML messages for supporting JAM (jabber as a middle-layer).
Things missing or things which I long to find a better version of:
I've discovered several important deficiencies in some python networking classes. The problem is in the structure of both medusa/asyncore and the regular python socketserver plus basehttpserver.
Namely a request object can't declare that (a) it doesn't want to handle the request and the server should continue (b) the server should stop right there.
Both these actions are important for fork()ing. If the request handler forks, you have two processes that both want to reply to the same socket. There libraries are structured to support always forking or threading, but not a mixture of either determined within the request handler itself.
Also, although the medusa base class is smart enough in asyncore and asynchat to handle a special ExitNow exception, the HTTP server classes built on top are rude and catch all excepts. Someone using these can't raise an ExitNow because it's returned as a 500 or other HTTP error.
These problems have made it impossible for me to build a workable XMLRPC/SSL server which forks. Alas, I'm fairly happy with most of the base classes / modules. I'm going to have to add some lines to these packages and send out patches.
On the technical side: using jabber server extensively for middleware messaging layer, lots of python going on, really like the MoinMoin Wiki, couldn't get samba to compile under cygwin though it should be doable (it's probably not worthwhile), Jarl/everybuddy are alright jabber clients for Linux, Windows registry entries can have duplicates or null key/values.
Had some new thoughts on category trees. Greater than, less than for a path or tree translates to childof, parentof. There's a really simple relationship between trees and strings which makes lots of algorithms constant order in surprising ways. [more on that later]
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!