Name: Charles Thayer
Member since: 2000-08-29 17:57:27
Last Login: 2008-03-15 17:59:16
Homepage: http://www.b2si.com/~thayer/
Notes: Haunts: teleias.com as Senior Engineer; cityrealty.com as CTO; b2si.com as founder; mediabridge.com as founder and Chief Scientist; ny.com as founder; cs.columbia.edu as CRF.
So, dear diary, I found something really cool the other day. Apparently gzip supports appending multiple files to a single file. Eg. The following actually works:
$ gzip < log1 > biglog.gz $ gzip < log2 >> biglog.gz $ gunzip < biglog.gz
The result is just like a "cat log1 log2". In fact, you can even tail a gzipped log:
$ foo | gzip >> biglog.gz $ tail -f biglog.gz | gunzip & $ bar | gzip >> biglog.gz
I haven't had time to look at the gzip code, but it has great implications for the sorts of things I have to work with. For example, I may be able to avoid uncompressing things to work with them, then recompressing. Ie. I'll need a lot less disk. Considering that we have the weblogs for ny.com back to 1994 that's some serious data to cope with. And that's nothing compared to the Terabytes a week at work (Inktomi/Yahoo).
So I'm not sure how I'll use this, but it sure is cool, and surprising that I've never noticed it before. Furthermore, a quick survey of my co-workers and friends revealed that noone has seen this before...
I'll keep my bits to myself now, I promise.
Anyway, it occurred to me that "lack of memory" is closely related to "good programming practice." Imagine if every day you had to work on a program, you forgot the prior day's coding!
Well, I realize it's a tenuous sort of argument. I probably couldn't program with limited memory. Some of the best programmers I know, or have heard of, have tremendously strong memory. James Tanis tells me that Seth Robertson was able to do some of his best kernel work because he could keep so much of it in his head at once. Nevertheless, it's a fascinating metric for judging good programming practice. If there's a question of how to program something, ask yourself what someone without long-term memory would do.
Where was I?
30 Apr 2002 (updated 30 Apr 2002 at 07:02 UTC) »
Eg. /usr/local/bin/spamassassin -P | /usr/lib/nmh/slocal -user $1
My maildelivery looks for the special header added by spamassassin, X-Spam-Flag. The default rules catch about 80% of my spam, which I dump in my spam folder just in case I want to double-check the results. There are some very complex configuration rules one can build that result in each message being scored as potential spam. Users can extend the rules, and apply new scores for existing rules.
It seems to simply use word analysis without much regard for the semantic or syntactic structure of email. Email comes into slocal and is sorted based on ifile's simple database of folders to word occurences. It's smart enough to learn when you refile a piece of mail.
I was hoping it would identify spam for me, but having tried it for a couple of weeks, I'm turning it off. Unfortunately, it's not appropriate for the way I file my email. I can imagine situations where it's useful, but it's got several problems:
One day, I'd like to get around to making some minor adjustments and fixes/updates. It seems like parts of it should be a standard part of the unix toolkit, such as word frequency analysis. Here are some guidelines if you're considering using it:
thayer certified others as follows:
[ Certification disabled because you're not logged in. ]
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!