Older blog entries for yeupou (starting at number 140)

Minimalistic BitTorrent client over NFS/Samba: with Transmission 2.x

I previously released a script to use transmission (BitTorrent client) over NFS/Samba.

This script was written for transmission 1.x. I updated to use transmission 2.x. It’s a hack more than anything else, it’s just a wrapper for transmission-remote, the official RPC client.

It works as before. You put $file.torrent in a watchdir, the script runs (cronjob) and create $file.trs (containing infos about the download) and starts the download. Rename it $file.trs- to pause it, remove the $file.trs to stop it. When the download is finished, you get a mail (if cron is properly set up).

Due to progress made by transmission devs, the install process is even simpler.

1) Set up watch and download dirs as before.

2) Install/Upgrade to transmission 2.x (packages cli and daemon).

3) [this make senses only if you used the previous version] Debian now starts transmission with the debian-transmission user. Trying to keep using torrent cause the init script to fail and, in the long run it’s anyway best to use the user debian maintainers provides. To easily switch to this new user, I removed the new debian-transmission entries in /etc/passwd and /etc/group and then replaced torrent by debian-transmission (except /home path obviously) in both these files (also updated /etc/cron.d/torrent/). Finally I ran chown debian-transmission.debian-transmission /var/lib/transmission-daemon /etc/transmission-daemon/settings.json.

4) Update transmission-daemon config. Make sure the daemon is down before, otherwise your changes won’t stay. So edit /etc/transmission-daemon/settings.json. I changed:

"blocklist-enabled": true,
"download-dir": "/home/torrent/download",
"message-level": 0,
"peer-port-random-on-start": true,
"port-forwarding-enabled": true,
"rpc-authentication-required": false,

5) Install the script torrent-watch.pl, test it:

cd /usr/local/bin
wget http://yeupou.free.fr/torrent2/torrent-watch.pl
chmod a+x torrent-watch.pl
su debian-transmission
torrent-watch.pl
cat status

6) Set up cronjob and log rotation:

* * * * * debian-transmission cd ~/watch && /usr/local/bin/torrent-watch.pl

/etc/logrotate.d/torrent:

/home/torrent/watch/log {
weekly
missingok
rotate 2
nocompress
notifempty
}

Then you should be fine :-)


Syndicated 2011-05-05 16:52:36 from # cd /scratch

Switching from NFSv3 to NFSv4

Today, I switched over NFSv4. I guess there published it for some reason and people claim it could increase file transfert rate by 20%.

In my case, to get it working properly, I…

Modified /etc/default/nfs-kernel-server on server side to have

NEED_SVCGSSD=no

Modified /etc/default/nfs-common on both clients and server side to have

NEED_STATD=no
NEED_IDMAPD=yes
NEED_GSSD=no

Modified /etc/exports on server side to have something starting by


/server 192.168.1.1/24(ro,fsid=0,no_subtree_check,async)

/server/temp 192.168.1.1/24(rw,nohide,no_subtree_check,async,all_squash,anonuid=65534,anongid=65534)

[...]

It forces you to set a root for the NFS server, in my case /server (which I had already in my NFSv3 scenario, so…), aka fsid=0.
You also need to specify nohide for any exports.

Modified /etc/fstab on clients side to set mount type to nfs4 and to remove the /server part from the paths, no longer necessary as path are relatives to fsid=0 which is /server. It gives entries like:

[...]
gate:/temp /stockage/temp nfs4 nolock 0 0
[...]

I had an export which was a symlink to somewhere in /home. NFSv4 is stricter than NFSv3 and there is no way to export something outside from fsid=0. So I made a bind, adding to /etc/fstab on server side:

[...]
/home/torrent/watch /server/torrent/watch none bind 0 0
[...]

After restarting nfs-kernel-server on the server side and nfs-common on both sides, umount NFS partitions and doing a mount -a on the client side, everything seems fine.


Syndicated 2011-01-06 18:03:19 from # cd /scratch

Package: amarok 2.4beta1 for Debian testing

Amarok is a nice music player for KDE. Inspired by iTunes interface, it features a clever random mode, provides Wikipedia/lyrics/photos pages for the currently played song, handles mtp devices and works with lastfm so you can have online stats of what you listen to and find people that listen to the same crap too. That’s definitely a nice software. But, unfortunately, it’s quite buggy.

This morning, Amarok would not start, no matter what. Well, it’s started once I erased .kde/share/config/amarok and .kde/share/apps/amarok. Then, shortly afterwards, it failed to start once more. I’m not quite frankly prepared to remove my Amarok config twice per day. So I decided I would just give a shot to a more recent version and the first beta for 2.4 around seemed a good pick.

Here’s amarok 2.4beta1 (2.3.90) packages for Debian testing amd64.

It was built with amarok Debian experimental directory (a new entry added in debian/changelog, usr/share/doc/kde/HTML/* removed from debian/amarok-common.install, usr/lib/strigi* removed from debian/amarok.install, usr/lib/kde4/*.so and usr/lib/*.so* added to debian/amarok.install, target override_dh_shlibdeps: added to debian/rules and all patches removed from debian/patches/series) and amarok official latest source package (renamed amarok_2.3.90.orig.tar.bz2) using the command dpkg-buildpackage -rfakeroot inside the amarok-2.3.90 directory (source tarball decompressed) containing also the debian directory.


Syndicated 2010-12-30 15:51:55 from # cd /scratch

Keeping the dpkg installed software database clean

The system on my workstation was installed in 2008, December. Actually, I installed Debian AMD64 version over an i386 version on the same box, which was installed around 2003.

Debian ships tools that makes it easy to keep a clean system. For instance, debfoster allows to easily get rid of all no-longer necessary libraries and al: you just have to select the important pieces and it will remove any software that is not required by one of these. And apt-get, nowadays, just like deborphan used to, even warns you when some software is no longer required and provides you with the autoremove command line argument that do the job automatically.

(debfoster is, supposedly, deprecated, like apt-get is in favor of aptitude. Well, I like debfoster)

That being said, if I run dpkg –list | grep ^r | nl | tail -n 1 on this box, after only one year, I get 617 lines about removed software I do not care about. Mostly, they were kept in the dpkg database because I (or me using the system) modified their conffiles. The following will clean this: for package in `dpkg –list | grep ^r | cut -f 3 -d ” “`; do dpkg –purge $package; done && debfoster


Syndicated 2010-12-26 23:30:40 from # cd /scratch

Using partitions labels

Recent linux versions (yes, I’m talking kernel here – linux is not an operating system) introduce new IDE drivers. It implies a device naming convention change. Instead of hda, hdb, etc, you get sda, sdb, etc, just like SCSI drives.

I have three hard disks on my main workstation – plenty of partitions. So in my case, it makes sense to use a unique identifier for each partition so nothing breaks up whenever I add/remove a drive or boot on an older kernel with the previous IDE drivers.

There are already uniques ids for each partition available using the command blkid. It returns unbearables and meaningless, but very uniques, ids like af8485cf-de97-4daa-b3d9-d23aff685638.

So it is best, for me at least, to label partitions properly according to their content and physical disposition, which makes for uniques id too in the end.

For ext3 partitions, I just did:

e2label /dev/sda2 sg250debian64
e2label /dev/sda3 sg250home

For the swap, e2label cannot help, so we set the label with mkswap, recreating it:

swapoff /dev/sda1
mkswap -L sg250swap /dev/sda1
swapon -L sg250swap

For ntfs partitions, I did:

apt-get install ntfsprogs
ntfslabel /dev/sdb1 hi150suxor
ntfslabel /dev/sdb2 hi150suxor2

Then, /etc/fstab must be edited as:


LABEL=sg250swap none swap sw 0 0

LABEL=sg250debian64 / ext3 errors=remount-ro 0 1
LABEL=sg250home /home ext3 defaults 1 2

LABEL=hi150suxor /mnt/suxor ntfs-3g defaults,user,noauto 0 0
LABEL=hi150suxor2 /mnt/suxor2 ntfs-3g defaults,user,noauto 0 0

Finally, grub (or any other boot loader) config should be updated to reflect that. However, unless I’m mistaken, with grub2 as shipped by debian, everything is generated usings scripts that does not seem to handle labels :(


Syndicated 2010-11-11 23:38:24 from # cd /scratch

Minimalistic BitTorrent client over NFS/Samba

Not quite AJAX

While current trends in music/movie industry will surely encourage development of a new generation of peer-to-peer softwares, the same way they made CD-burners cheap in a less than a decade, I’m still quite happy with BitTorrent.

I used torrentflux for quite some time. Shipped with Debian, installed on my local home server, accessible to any box on the network over https, even if it’s interface is not exactly eye candy, it works. I just had to configure web browsers to access http://server/torrentflux/index.php?url_upload=$ each time they hit a .torrent file. But even if web interface may be powerful, user-friendly, I resent torrentflux for having me to click plenty of time (at least two times just to start a download), after having logged in.

I took a look at rTorrent. It works by looking into a directory for new .torrent then load them automatically. Wonderful. Sadly, you have to log in over SSH and then manually select over a text user interface which download you want to actually start.

I liked the idea of dragging’n'dropping .torrent in one directory. It can be done over NFS or Samba, with no additional login. I have those already set up on my server. Next step is to handle queue management with the same directory.

I came up with the idea of using a command line BitTorrent client through a script that would watch the damn NFS/Samba directory. It would :
– notice and register new .torrents dropped
– allow to forcefully pause/remove any designated torrent
– allow to forcefully pause all downloads
– warn by mail whenever a download is completed and unregister the relevant torrent

So I wrote such script so it would handle transmission daemon as shipped by debian testing. It looks for file in a given directory named after the following syntax:
– $file.torrent = torrent to be added
– $realfile.hash = torrent being processed (delete it to remove the torrent)
– $realfile.hash- = torrent paused
– $realfile.hash+ = torrent (supposedly) completed and already removed
– all- = pause all

Here’s the HOWTO:

apt-get install tranmissioncli screen
adduser torrent
echo "torrent: youruser" >> /etc/aliases

su torrent
cd ~/
mkdir watch download
exit

mkdir -p /server
ln -s /home/torrent /server

Obtain uid/gid of torrent necessary below:

cat /etc/passwd | grep torrent

Here I get 1003/1003.

Edit /etc/exports to set up NFS access (this assumes your NFS server is already set up), add:

# every box on the network get rw access to rtorrent
/home/torrent/download 192.168.1.1/24(rw,sync,all_squash,anonuid=1003,anongid=1003)
/home/torrent/watch 192.168.1.1/24(rw,sync,all_squash,anonuid=1003,anongid=1003)

On each NFS client, add in /etc/fstab (you must create mount points):

server:/home/torrent/download /mnt/torrent/download nfs rw,nolock 0 0
server:/home/torrent/watch /mnt/torrent/watch nfs rw,nolock 0 0

Edit /etc/samba/smb.conf to set up Samba access (this assumes your Samba server is already set up, add:

[Download]
path = /home/torrent/download
browseable = yes
public = yes
valid users = youruser
force user = torrent
force group = torrent
writable = yes

[Watch]
path = /home/torrent/watch
browseable = yes
public = yes
valid users = youruser
force user = torrent
force group = torrent
writable = yes

Restart NFS/Samba servers, mount networked file system on the clients.

Add a startup script for transmission-daemon, edit it if need be (daemon configuration is done here), fire it up:

cd /etc/init.d/
wget http://yeupou.free.fr/torrent/init.d/torrent
update-rc.d torrent defaults 80
/etc/init.d/torrent start

At any time, you can check the current daemon process with screen:

screen -r torrent

Add torrent-watch.pl in /usr/local/bin or /usr/bin (anywhere in $PATH):

cd /usr/local/bin
wget http://yeupou.free.fr/torrent/torrent-watch.pl
chmod a+x torrent-watch.pl

Check that it runs properly. Drag’n'drop any .torrent in /home/torrent/watch and run:

su torrent
torrent-watch.pl
cat status

If everything is ok, add in /etc/cron.d/torrent:

* * * * * torrent cd ~/watch && /usr/local/bin/torrent-watch.pl

You’re done.


Syndicated 2010-11-08 17:16:25 from # cd /scratch

Release: SeeYouLater 1.1

I’ve just released SeeYouLater 1.1 (fetch a list of IP or known spammers and to ban them by putting them in /etc/hosts.deny). This is a small cleanup release, now it avoids duplicates in both database and hosts.deny.

You can obtain it on the Gna! project page using SVN or debian packages.


Syndicated 2010-10-08 14:49:33 from # cd /scratch

Being warned of pending packages upgrades with apt-warn

I started using GNU/Linux with RedHat 5.2. It cames with plenty of packages (GNOME 0.20, Linux 2.0.36, etc) and I was quite happy to deal with RPM (RedHat Package Manager, hum) telling me which package is required to install another one, which package contains which files. You simply had to go to RPMFind.net to get missing packages. If no package was available, you could write a clean RPM spec to build one or use checkinstall to build RPMs on the fly when doing make install. It was more than ten years ago, still, nowadays Microsoft Windows XP (sorry, I never used Vista/7) have no clean packaging system that I know of; you have a clumsy list of installed software (InstallShield, whatever it means), no clear idea of dependancies, you can remove pieces of software required by other still installed software and there are plenty of installed pieces of software that you have no way of clearly listing.

At that time, I had a Pentium II 350 MHz and a Pentium 200 MMX as workstations and a Pentium 133 MHz as home server. I, soon enough, had the idea to write a script to produce a list of installed packages readable over intranet and so I published a BASH-based script to output an HTML view of a RPM database called pdbv, standing for Package DataBase View, the first version 1.0.0 being released in June 2002. On the Gna! project page, when listing pros and cons of pdbv, the first pro that came up was “it does not require lucid/gtk+/qt or other big libs”: nowadays, GTK+ and Qt probably no longer strike the mind of anyone as “big (bloated) libraries” and I assume Lucid is no longer even installed on most GNU/Linux systems. Later, I rewrote pdbv in Perl which made if was faster and lighter. Here are demos of pdbv: pdbv 1.x with French locale, pdbv 2.x.

As you can see browing pdbv’s demos, it obviously supports also dpkg (Debian Package, duh). I gradually switched over Debian GNU/Linux for two reasons: apt-get and the branching stable/testing/unstable. Apt-get was the end of wasting time on RPMFind. Debian stable offers astonishing stability for servers while testing/unstable provides brand new desktop software in a timely fashion.

Nowadays, I spend less time dealing with computers and I no longer rely much on pdbv. Due to lack of support (I guess I’m to blame; but KPackage or Synaptic are surely more useful to endusers anyway), it will be removed from Debian at its next stable release (it is still in Debian lenny but no longer in testing). I no longer care much about which software is installed, I use debfoster to keep clean my systems (I know, just like apt-get, debfoster is deprecated in favor of aptitude, but I cannot help using it instead).

However I’d like to know which upgrades are pending. For this reason (and I’m quite sure I’m reinventing something that already exists, but I failed to find it and I wanted it my way), I wrote a small script called apt-warn that will run apt-get update and then warn you of pending updates (only if it has not warned you already about them). It requires Apt::Pkg. It is supposed to be installed a cronjob in /etc/cron.daily. Running on my workstation this morning, it outputs:


Follows 4 newly updated package(s) that you could upgrade on bender:
hicolor-icon-theme (0.11-1 -> 0.12-1)
sudo (1.7.2p7-1 -> 1.7.4p4-2)
xserver-common (2:1.7.7-4 -> 2:1.7.7-6)
xserver-xorg-core (2:1.7.7-4 -> 2:1.7.7-6)

Follows 5 recently updated package(s) that you also could upgrade:
autopoint (0.18.1.1-1 -> 0.18.1.1-2)
gettext (0.18.1.1-1 -> 0.18.1.1-2)
gettext-base (0.18.1.1-1 -> 0.18.1.1-2)
login (1:4.1.4.2-1 -> 1:4.1.4.2+svn3283-1)
passwd (1:4.1.4.2-1 -> 1:4.1.4.2+svn3283-1)

Autopoint, gettext, login and passwd pending upgrades were already warned about yesterday. A second run will return no output since there is no other available upgrade not already warned about.


Syndicated 2010-09-19 10:19:39 from # cd /scratch

Getting MPlayer to cope cleanly with redshift

Redshift is a nice tool that adjusts the color temperature of your screen according to your surroundings. As result, your eyes hurt less if you are working in front of the screen at night.

It is easy to set up, it is for instance already packaged for Debian (package redshift). Once installed, you have to determine longitude and latitude of your position – googling around should do. And you can made some test to defines which range of temperature you want redshift to work with – I like it cold, so I go from 6500 to 9300. And you add it in autostart, the way you want.

In my case, I added redshift -l 48.799:2.505 -t 6500:9300 & just before startkde in my ~/.xsession

Easy, isn’t it? Sure. But when I watch TV with MPlayer or any video with SMPlayer, especially around 01 AM, I’d like color temperature back to normal. And, no, I’m not fond of the idea of doing a killall each time I start watching a video and then a call to redshift afterwards when I’m done.

Configure SMPlayer to use this MPlayer wrapper that kills and starts redshift at the right time

So I wrote a wrapper that send SIGTERM to any redshift process when starting, call MPlayer then, when over, restart redshift. It uses basic perl functions so it has no dependencies. You may however edit it to the set latitude/longitude and temperature range to whatever you like.

It should be just as if you were using MPlayer, so you can configure SMPlayer, or any MPlayer frontend, to use this wrapper. Obviously, this wrapper could be modified to work with vlc, xine or any else video rendering engine.


Syndicated 2010-09-09 14:55:00 from # cd /scratch

Slaying Spams with both Bogofilter and SpamAssassin embedded in exim

Ads are spam. Good thing with the internet’s ads is that you can set up countermeasures.

(Disclaimer: yes, there is nothing new here, just an example of setup)

I have plenty of email addresses from different providers, some are definitely history. I could go through the websites of all of these and set up forwarding for the one I no longer use but still want to be able to get mail from, just in case. Well, I would do that if I was using my mail client to fetch mails – because otherwise fetching mails would actually take ages.

But, as I have a local home underclocked :) server, I find way easier and potent to, instead, use ESR’s fetchmail to download them all to a single account that is accessed by my mail client through IMAPS. I have a /etc/fetchmailrc like:

poll pop.free.fr with proto POP3
user 'XXX' there with password 'XXX' is 'localuser' here
poll imap.gmail.com with proto IMAP
user 'XXX@gmail.com' there with password 'XXX' is 'localuser' here with ssl
user 'XXZ@gmail.com' there with password 'XXZ' is 'localuser' here with ssl

Fetchmail download mails than then relies on the installed SMTP, which is Exim, to deliver it to end user account mailbox accessible through IMAPS.

What’s so nifty nifty about? Well, mails will also be filtered for spam. As it happens on the local home server, it will be unnoticeable for the end user that is me. We’ll use several anti-spam tools, not caring about redundancy and time-consumption: DNSBLs, Bogofilter, SpamAssassin, razor2.

So, here we go. Note that Exim (exim4) in Debian use the user Debian-exim. localuser is the recipient end-user.
We will create a system group dedicated to spamchecking to easily share bayesian databases:

# addgroup --system spamslayer
# adduser Debian-exim spamslayer
# adduser localuser spamslayer

* Bogofilter is a bayesian spam filter . It is said to be faster and lesser time consuming than the SpamAssassin’s own bayesian filter so will run mails through it first. It is installed with the debian package.

Edit /etc/bogofilter.cf as follows:

bogofilter_dir=/var/lib/bogofilter
db_transaction=yes

The bayes directory must be created by hand:

# mkdir /var/lib/bogofilter
# chgrp spamslayer /var/lib/bogofilter
# chmod 2777 /var/lib/bogofilter

* SpamAssassin is a powerful, at the cost of time-consumption, spam-killer. It is installed with the debian package.

In the following site-wide config /etc/spamassassin/local.cf, I use bayesian filters, razor2, several DNSBLs and I adjust some tests according to my needs:

# Save spam messages as a message/rfc822 MIME attachment instead of
# modifying the original message (0: off, 2: use text/plain instead)
#
report_safe 1
# Set which networks or hosts are considered 'trusted' by your mail
# server (i.e. not spammers)
#
trusted_networks 192.168.1.
# Locales
#
# (I only receive mails in English or French)
ok_locales en fr
# Set the threshold at which a message is considered spam (default: 5.0)
#
required_score 3.3
# Use Bayesian classifier (default: 1)
#
# (I created the relevant directory)
use_bayes 1
bayes_file_mode 0777
bayes_path /var/lib/spamassassin-bayes/bayes
score BAYES_20 0.3
score BAYES_40 0.5
score BAYES_50 0.8
score BAYES_60 1
score BAYES_80 2
score BAYES_95 2.5
score BAYES_99 6
# Bayesian classifier auto-learning (default: 1)
#
# (I may change that, not sure about it)
bayes_auto_learn 1
# Set headers which may provide inappropriate cues to the Bayesian
# classifier
#
bayes_ignore_header X-Bogosity
bayes_ignore_header X-Spam-Flag
bayes_ignore_header X-Spam-Status
# use razor
# (/etc/razor is the standard debian path)
use_razor2 1
razor_config /etc/razor/razor-agent.conf
score RAZOR2_CF_RANGE_51_100 3.2
# some rbl checks are already made by exim, at RCPT time, not all.
skip_rbl_checks 0
rbl_timeout 30
score RCVD_IN_SBL 15
score RCVD_IN_XBL 15
score RCVD_IN_SORBS_HTTP 15
score RCVD_IN_SORBS_SOCKS 15
score RCVD_IN_SORBS_MISC 15
score RCVD_IN_SORBS_SMTP 15
score RCVD_IN_SORBS_ZOMBIE 15
# adjust some tests scores: lower DUL test
score FROM_ENDS_IN_NUMS 0.2
score FROM_HAS_MIXED_NUMS 0.2
score FROM_HAS_MIXED_NUMS3 0.2
score RCVD_IN_NJABL_DUL 0.1
score RCVD_IN_SORBS_DUL 0.1
# lower stupid test
score DNS_FROM_SECURITYSAGE 0.0
# adjust some tests scores
score FAKE_HELO_HOTMAIL 3
score FORGED_HOTMAIL_RCVD 3
score HTML_FONT_BIG 2.4
score NO_REAL_NAME 2
score RCVD_IN_BL_SPAMCOP_NET 3
score SUBJ_ILLEGAL_CHARS 4.8
score EXTRA_MPART_TYPE 2.8
score SUBJ_ALL_CAPS 2.6
# increase all scores related to drugs: what do I care, duh
score DRUGS_ANXIETY 5
score DRUGS_ANXIETY_EREC 5
score DRUGS_ANXIETY_OBFU 5
score DRUGS_DIET 5
score DRUGS_DIET_OBFU 5
score DRUGS_ERECTILE 5
score DRUGS_ERECTILE_OBFU 5
score DRUGS_MANYKINDS 10
score DRUGS_MUSCLE 5
score DRUGS_PAIN 5
score DRUGS_PAIN_OBFU 5
score DRUGS_SLEEP 5
score DRUGS_SLEEP_EREC 5
score DRUGS_SMEAR1 5
# same goes for porn
score AMATEUR_PORN 5
score BEST_PORN 5
score DISGUISE_PORN 5
score DISGUISE_PORN_MUNDANE 5
score FREE_PORN 5
score HARDCORE_PORN 5
score LIVE_PORN 5
score PORN_15 5
score PORN_16 5
score PORN_URL_MISC 5
score PORN_URL_SEX 5
score PORN_URL_SLUT 5

The bayes directory must be created:

# mkdir /var/lib/spamassassin-bayes
# chown Debian-exim /var/lib/spamassassin-bayes
# chmod 0777 /var/lib/spamassassin-bayes

Obviously, it implies that razor2 must be properly installed. We install the debian package then set it up. Remember it must run with user Debian-exim, so we do:

# chown -R Debian-exim:spamslayer /etc/razor
# su Debian-exim
$ razor-admin -home=/etc/razor -register
$ razor-admin -home=/etc/razor -create
$ razor-admin -home=/etc/razor -discover

To save ressources, we start SpamAssassin as a daemon (spamd), that will be called using its specific client (spamc). Before using the initd script, edit as follows /etc/defaut/spamassassin:

# Change to one to enable spamd
ENABLED=1
# SpamAssassin uses a preforking model, so be careful! You need to
# make sure --max-children is not set to anything higher than 5,
# unless you know what you're doing.
OPTIONS="--create-prefs --max-children 5 --helper-home-dir -u Debian-exim -g spamslayer"
# Cronjob
# Set to anything but 0 to enable the cron job to automatically update
# spamassassin's rules on a nightly basis
CRON=1

All that being do, you’ll want to (re)start the daemon with the relevant initd script (/etc/init.d/spamassassin restart here).

* Now we’ll tune Exim to call all by himself first Bogofilter and then SpamAssassin, if necessary only. We use splitted configuration in /etc/exim4/conf.d/. That is debian-specific I think but it does make any difference anyway.

First we define useful transports in /etc/exim4/conf.d/transport/35_spamblock (the name 35_spamblock is arbitrary and the number does not matter here):

spamslay_bogofilter:
driver = pipe
command = /usr/sbin/exim4 -oMr spamslayed-bogofilter -bS
use_bsmtp = true
transport_filter = /usr/bin/bogofilter -l -p -e
home_directory = "/tmp"
current_directory = "/tmp"
# must use a privileged user to set $received_protocol
# on the way back in!
user = Debian-exim
group = spamslayer
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =
#
spamslay_spamd:
driver = pipe
command = /usr/sbin/exim4 -oMr spamslayed-spamd -bS
use_bsmtp = true
transport_filter = /usr/bin/spamc
home_directory = "/tmp"
current_directory = "/tmp"
# must use a privileged user to set $received_protocol
# on the way back in!
user = Debian-exim
group = spamslayer
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =

Second we define routers, here in /etc/exim4/conf.d/router/350_spamblock – the order matters, here it is just after 300_exim4-config_real_local and before 400_exim4-config_system_aliases:

# first bogofilter
spamslay_router_bogofilter:
# When to scan a message :
# - it isn't already flagged as spam
# - it has not yet been spamslayed at all
condition = "${if and { {!eqi{$h_X-Spam-Flag:}{yes}} {!eq {$received_protocol}{spamslayed-bogofilter}} {!eq {$received_protocol}{spamslayed-spamd}} }}"
driver = accept
transport = spamslay_bogofilter
#
# second spamd
spamslay_router_spamd:
# When to scan a message :
# - it isn't already flagged as spam
# - it has not yet been spamslayed with SA
condition = "${if and { {!eqi{$h_X-Spam-Flag:}{yes}} {!match{$h_X-Bogosity:}{^Yes}} {!eq {$received_protocol}{spamslayed-spamd}} }}"
driver = accept
transport = spamslay_spamd
#
# This route will send any mail that got here to the devnull alias, that
# should be configured in /etc/aliases to be a real link to /dev/null.
# This route should get only mails that have spam score higher than 14.
# This will affect users mails!
spamslay_killit:
condition = "${if ge{$h_X-Spam-Level:}{\*\*\*\*\*\*\*\*\*\*\*\*\*\*} {1}{0} }"
driver = redirect
data = spam
file_transport = address_file
pipe_transport = address_pipe

* Next step, now that spams are flagged, it makes sense to put them apart. I do this with procmail. Here’s the relevant bit /home/localuser/.procmailrc:

IMAPDIR=$HOME/.Maildir/
ISDIR="/"
DOT="."
# tagged by hand, to be learned from by both SpamAssassin and Bogofilter
spam=$IMAPDIR$DOT"Poubelle.Spam"$ISDIR
# by spamd
spamBySA=$IMAPDIR$DOT"Poubelle.SpamSA"$ISDIR
# by bogofilter
spamByBg=$IMAPDIR$DOT"Poubelle.SpamBg"$ISDIR
expirable=$IMAPDIR$DOT"Poubelle.Expirable"$ISDIR
#
:0
* ^X-Spam-Status: Yes
$spamBySA
:0
* ^X-Spam-Flag: YES
$spamBySA
#
:0
* ^X-Bogosity: Yes
$spamByBg

* Training bayesian filters.

Now that spam ended up in a specific mailbox/maildir, both SpamAssassin and Bogofilter bayesians filters can be trained to be effective. We add the following in /etc/cron.d/bayes:

# trains bayesian filters
BASEDIR="/home/localuser/.Maildir"
SPAMDIR_MANUAL="$BASEDIR/.Poubelle.Spam/cur/ $BASEDIR/.Poubelle.Spam/new/ $BASEDIR/.Poubelle.Spam"
SPAMDIR_SA="$BASEDIR/.Poubelle.SpamSA/cur/ $BASEDIR/.Poubelle.SpamSA/new/ $BASEDIR/.Poubelle.SpamSA"
SPAMDIR_BG="$BASEDIR/.Poubelle.SpamBg/cur/ $BASEDIR/.Poubelle.SpamBg/new/ $BASEDIR/.Poubelle.SpamBg"
#
# spamd: can handle easily bogofiltered found spams
25 * * * * localuser /usr/bin/sa-learn --spam $SPAMDIR_MANUAL $SPAMDIR_BG >/dev/null
#
# bogofilter: not sure how it would cope with spamd headers so we'll avoid them
# for now
# (-u was not set as it is discouraged perf-wise in bogofilter's manual)
# Dirty hack to cope with rights issues: running as root - not great
28 * * * * root /usr/bin/bogofilter --register-spam -B $SPAMDIR_MANUAL $SPAMDIR_BG && chown Debian-exim -R /var/lib/bogofilter

Obviously, if you want it to learn from plenty of different users, you’ll have to think of something more elaborate :)
Anyway, regarding plenty of users, it would actually probably wise to think twice about the whole concept of sharing bayesian filters that may not at all be accurate for very differents users.

I’m not very happy with the handling of bogofilter files read/write access, it remains to be cleaned up. Obviously, one alternative would have been to avoid meddling with Exim and to run both bogofilter and spamd via procmail. Sure, it would not have been site-wide setup but for a few users, ~/.procmailrc can be replicated easily. But actually I enjoy messing with Exim, that’s kind of a hobby. I skipped here the part where we call DNSBLs in Exim (working out-of-the-box anyway). And on a production server, with the SMTP wide opened to the web, it is possible to follow this approach just to shut off spammers at SMTP-time -which induces a huge resources gain- and even ban them.


Syndicated 2010-08-13 14:56:03 from # cd /scratch

131 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!