Integrating SpamAssassin into Courier

Please excuse the current flurry of Linux articles. I’m moving servers and this is my way of writing notes to myself and possibly helping out others. Normal service will resume shortly ;-)

This article is a follow-up to my guide on Installing Courier on Gentoo. As long as you have a working Courier installation on your system, there should be no issues following this guide.

Running a mail server without some kind of spam filtering is just insane these days. SpamAssassin is a nice solution, especially if you run SpamAssassin during the SMTP transaction to reject spam while it is being uploaded to your server.

SMTP Spam Filtering

Traditionally, a mail server that receives incoming messages using the SMTP protocol uses a small application that does nothing but accept messages and hand them over to another application which puts them in a user’s mailbox. This is done so the SMTP server can run with minimal rights – if someone managed to hack it, he would neither be able to access any mailboxes nor would he gain control over the system.

The drawback is that this means any spam messages will be accepted into the system blindly.

#Server: 220 server.com ESMTP
 Client: HELO spammer.com
#Server: 250 spammer.com ok.
 Client: MAIL TO: <user@server.com>
#Server: 250 Ok
 Client: RCPT FROM: <evil@spammer.com>
#Server: 250 Ok
 Client: DATA
#Server: 250 Transmit message, terminate with a single line containing '.'
 Client: From: Not an Evil Spammer <good@person.com>
 Client: To: My Victim <user@server.com>
 Client: Subject: Consolidate your penis length and enlarge your debt NOW!
 Client:
 Client: Do you have too much money?
 Client: Or not enough debt yet?
 Client: Then send your money to me!
 Client:
 Client: I will sell you worthless^H^H^H^H^H^H^Hnderful pills (:
 Client: .
#Server: 250 Ok. Message accepted.
 Client: QUIT
#Server: 221 Bye.

When the spam filter runs, whoever connected to the mail server to submit the spam message is long gone. As a result, the mail server can only try to complain to the email address the spammer claimed to be his. This is discourages, because spammers often fake the ‘from’ address and your mail server will merely distribute the spam to another victim.

Diagram of a mail server complaining to another server about spam

A much nicer approach is to scan the email while the SMTP transaction takes place, in other words, while the spammer is still uploading it:

#Server: 220 server.com ESMTP
 Client: HELO server.com
#Server: 250 server.com ok.
 Client: MAIL TO: <user@server.com>
#Server: 250 Ok
 Client: RCPT FROM: <evil@spammer.com>
#Server: 250 Ok
 Client: DATA
#Server: 250 Transmit message, terminate with a single line containing '.'
 Client: From: Not an Evil Spammer <good@person.com>
 Client: To: My Victim <user@server.com>
 Client: Subject: Consolidate your penis length and enlarge your debt NOW!
 Client:
 Client: Do you have too much money?
 Client: Or not enough debt yet?
 Client: Then send your money to me!
 Client:
 Client: I will sell you worthless^H^H^H^H^H^H^Hnderful pills (:
 Client: .
#Server: 456 Your message is considered spam.
 Client: WTF?

This way, the original sender will see that his message has been rejected:

Diagram of a mail server rejecting spam while it is being sent

This also works for genuine email messages because the sending server will notice the delivery failure and inform the user that delivery didn’t succeed:

Diagram of a mail server rejecting a genuine message

So rejecting spam during the SMTP transaction doesn’t cause disaster when a false positive is detected by your spam filter and the original sender will see that his message wasn’t delivered.

Spam bots will probably not take any special action – there have been some suggestions that emails being rejected this way might cause email addresses to be removed from the spammer’s mailing list, but I doubt this because the rejecting server can only report a temporary failure – otherwise genuine senders might think the email address no longer exists when a false positive occurs.

Courier has the ability to do this and it doesn’t even require anything special to set up. Courier Maildrop can be instructed to do virus scans, spam checks and other kind of test you want to run on incoming email during the SMTP transaction.

Other servers can be equipped with this capability as well. For qmail, there’s SimScan and transparent SMTP proxies like Mail Avenger or ASSP can be layered in front of any kind of SMTP server.

1. Enable Maildrop during SMTP

The important, but seemingly well hidden page on the Courier website that explains how to switch maildrop to run during the SMTP transaction is here: localmailfilter.

First, enable the filtering system so maildrop will be invoked during the SMTP transaction:

echo /usr/bin/maildrop > /etc/courier/maildropfilter

2. Create Filters

As you can see in the localmailfilter docs, maildrop will be invoked during the SMTP transaction with the appended parameters -D uid/gid -M {rcptfilter|smtpfilter}. The maildrop documentation then states:

-M filterfile

maildrop […] then reads $HOME/.mailfilters/filterfile. For security reasons the name of the file may not begin with a slash or include periods. maildrop is very paranoid: both $HOME/.mailfilters, and $HOME/.mailfilters/filterfile must be owned by the user, and may not have any group or world permissions.

So you need to create a directory .mailfilters in your user’s home directory and add two files to it. The first one is named rcptfilter and will be used to check whether a sender’s email address is allowed to send email to your system, the second one is named smtpfilter and is used to check the completed email before the SMTP transaction is acknowledged to the sender.

su yourusername

cd ~

mkdir .mailfilters
touch .mailfilters/rcptfilter
touch .mailfilters/smtpfilter

chmod go-rwx mailfilters -R

exit

The rcptfilter will be executed right after the sender stated the recipient of the email. There isn’t much you can do at this point, except to decide whether the email should be accepted or rejected blindly, or whether smtpfilter should be run on it:

# /home/yourusername/.mailfilters/rcptfilter

# Exit codes
#
#    0  Accepts the message for delivery without running smtpfilter
#   97  Rejects the message immediately
#   99  Lets the client proceed to transmit the message data,
#       then runs smtpfilter on it
EXITCODE=99
exit

Now comes the interesting part. smtpfilter will run right when the sender has completed the email by sending a single ‘.’ line, but before your email server acknowledges the message. At this point, you can either accept the email or reject it, which leads to a 450: mailbox unavailable message. You can customize the message (eg. to inform the sender what has happened) by simply echoing it before exiting.

# /home/yourusername/.mailfilters/smtpfilter

# Exit codes
#
#   0  Accepts the message and begins delivery
#   1  Rejects the message with code 450
#
echo "This is a test!"
echo "You should see this text in the failure message of the sending server"
EXITCODE=1
exit

As you can see, the above filter will reject all emails. This is for testing only. User another email provider to send a test email to your server. You should get a failure notice like this:

Hi. This is the qmail-send program at mailprovider.com.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

<yourusername@server.com>:
1.2.3.4 failed after I sent the message.
Remote host said: 558-450-yourusername:
558-450-This is a test!
558 450 You should see this text in the failure message of the sending server

--- Below this line is a copy of the message.

If this is working, you can advance to the next step, which is to:

3. Create a SpamAssassin Filter

First, you need to make sure that SpamAssassin is installed, of course. Gentoo users can do this by simply running:

emerge spamassassin

Invoking shell commands in a filter is done by either using the xfilter command or by wrapping the command in backticks. Maildop will then execute the command in a child process.

If you studied the maildrop documentation, however, you might have come across several statements indicating that the xfilter command, backticks and lots of other things aren’t allowed in maildrop’s embedded mode. The solution can be found in the maildrop documentation once more:

/etc/courier/maildroprcs

If maildrop encounters an include statement where the filename starts with /etc/courier/maildroprcs/, the normal restrictions for the embedded mode are suspended while executing the filter file in the /etc/courier/maildroprcs directory. The restrictions are also suspended for any additional filter files that are included from /etc/courier/maildroprcs. The restrictions resume once maildrop finishes executing the file from /etc/courier/maildroprcs.

Create the directory /etc/courier/maildroprcs if it doesn’t exist and paste the following filter into a new file under /etc/courier/maildroprcs/spamassassin:

# /etc/courier/maildroprcs/spamassassin

# Only scan emails with a size of less than 256 KiB (this is to prevent
# denial-of-service attacks and large spikes in CPU load).
#
# Skip mails which have the magic word "mapsetahi" ("i hate spam" backwards)
# in them. This is an option for email senders to bypass the spam filter
# if it wrongly rejected their email.
#
if ( $SIZE < 262144 && !/^.*mapsetahi.*/:b )
{
        # Run SpamAssassin
        #
        #   --check      Just checks the email and prints out the spam score.
        #   --exit-code  Makes SpamAssassin return 1 if the email is identified
        #                as spam and 0 if it is genuine (or an error occured)
        #
        `/usr/bin/spamc --check --exitcode`
        if ( $RETURNCODE == 1 )
        {
                echo "###########################################################"
                echo "# Your email appears to be spam and was rejected"
                echo "#"
                echo "# If this is a false positive, please send your email"
                echo "# once more, this time including the word 'mapsetahi'"
                echo "# somewhere in its text. Sorry for the inconvenience."
                echo "###########################################################"

                EXITCODE=97
                exit
        }
}

The < should be a ‘<‘ (smaller-than) character but my syntax highlighter just doesn’t like me ;..(

This filter will scan any emails smaller than 256 KiB which do not contain the magic word ‘mapsetahi’. Further down, if SpamAssassin identifies an email as spam, the mail server’s error message contains instructions for the sender to add the magic word to his email so it will get through on the next attempt.

Next, you need to enable this email filter for your local mail user. This is done simply by including the spamassassin filter in your user’s /home/yourusername/.mailfilters/smtpfilter:

# /home/yourusername/.mailfilters/smtpfilter

# Do a SpamAssassin check on this message
include "/etc/courier/maildroprcs/spamassassin"

That’s it. You can test it by sending yourself some emails. There’s a special token that, when it appears in an email, makes SpamAssassin classify the email as spam no matter its contents. It’s called The GTUBE (generic test for unsolicited bulk email). Simply send yourself an email containing the string

XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X

The behavior should be this:

Normal emails should get through.
Emails with the GTUBE should be rejected.
Emails with the GTUBE and the magic word (‘mapsetahi’) should get through again.

4. Fine-Tune SpamAssassin

If you’ve sent the test emails in the previous step, SpamAssassin should have created a user_prefs file in /home/yourusername/.spamassassin. This is the place where you create your per-user whitelist (email addresses from which emails will never be blocked) and adjust SpamAssassin’s scoring system. This is mine:

# /home/yourusername/.spamassassin/user_prefs

# How many points before a mail is considered spam.
required_score          4.0

# Whitelist and blacklist addresses are now file-glob-style patterns, so
# "friend@somewhere.com", "*@isp.com", or "*.domain.net" will all work.
# whitelist_from        someone@somewhere.com
whitelist_from          *sourceforge.net

# Adjust the scores for certain tests
score BAD_CREDIT               9.9
score DRUGS_ERECTILE           9.9
score RCVD_IN_SORBS_DUL        9.9
score RCVD_IN_BL_SPAMCOP_NET   9.9
score RDNS_DYNAMIC             9.9
score BAYES_99                 9.9
score FB_CIALIS_LEO3           9.9
score RATWARE_MS_HASH          9.9
score DATE_IN_PAST_96_XX       9.9
score RCVD_IN_PBL              9.9
score RCVD_IN_XBL              9.9

The global settings for SpamAssassin are stored in /etc/spamassassin/local.cf. In the past, I have run SpamAssassin with autolearn enabled for its Bayesian classifier. After several months, it had trained itself to report a spam probability of 99% for genuine mails and 0% for spam :) – so I’m now training SpamAssassin’s Bayesian classifier myself. This is what I changed:

# /etc/spamassassin/local.cf

# Use Bayesian classifier
use_bayes 1

# Bayesian classifier auto-learning
bayes_auto_learn 0

# Set headers which may provide inappropriate cues to the Bayesian classifier
bayes_ignore_header X-Bogosity
bayes_ignore_header X-Spam-Flag
bayes_ignore_header X-Spam-Status

There’s one more thing to change in local.cf – where the database for SpamAssassin’s Bayesian classifier is stored. This is important because you’ll probably want a site-wide Bayesian database that (if not, just skip this step, but you’ll have to alter the training script I present further down so it runs SpamAssassin’s sa-learn command under the user name of each scanned user’s mailbox).

The site-wide Bayes setup guide in the SpamAssassin wiki explains one way of achieving this (I used MySQL in my own setup because I don’t want to use a dozen different database engines, but this should work too).

# /etc/spamassassin/local.cf

# Set up a global database for the Bayesian classifier
bayes_path /var/spamassassin/bayes/bayes
bayes_file_mode 0777

To train SpamAssassin’s Bayesian classifier, I wrote a small shell script that I saved as /etc/cron.hourly/train-spamassassin.sh so it is run once per hour (given you’re using vixie-cron – otherwise you’ll have to modify your crontab accordingly).

#!/bin/sh
# /etc/cron.hourly/train-spamassassin.sh

# Train SpamAssassin's Bayesian classifier by collecting emails identified
# as "spam" or "ham" for each user's designated folders
for homedirectory in /home/*
do

  # Only proceed if this user has a Maildir set up
  if [ -e $homedirectory/Maildir ]; then
    #chmod +t $homedirectory # Prevent maildrop from adding new emails

    # Learn any spam sorted out by the user, then move it to Trash
    if [ -e $homedirectory/Maildir/.Train.Spam ]; then
      if ls $homedirectory/Maildir/.Train.Spam/cur/* >/dev/null 2>&1; then
        sa-learn --spam $homedirectory/Maildir/.Train.Spam/cur/*
        mv $homedirectory/Maildir/.Train.Spam/cur/* \
           $homedirectory/Maildir/.Trash/cur/
      fi
      if ls $homedirectory/Maildir/.Train.Spam/new/* >/dev/null 2>&1; then
        sa-learn --spam $homedirectory/Maildir/.Train.Spam/new/*
        mv $homedirectory/Maildir/.Train.Spam/new/* \
           $homedirectory/Maildir/.Trash/new/
      fi
    fi

    # Learn any ham provided by the user, then move it to Trash
    if [ -e $homedirectory/Maildir/.Train.Ham ]; then
      if ls $homedirectory/Maildir/.Train.Ham/cur/* >/dev/null 2>&1; then
        sa-learn --ham $homedirectory/Maildir/.Train.Ham/cur/*
        mv $homedirectory/Maildir/.Train.Ham/cur/* \
           $homedirectory/Maildir/.Trash/cur/
      fi
      if ls $homedirectory/Maildir/.Train.Ham/new/* >/dev/null 2>&1; then
        sa-learn --ham $homedirectory/Maildir/.Train.Ham/new/*
        mv $homedirectory/Maildir/.Train.Ham/new/* \
           $homedirectory/Maildir/.Trash/new/
      fi
    fi

    #chmod -t $homedirectory # Allow new emails again
  fi

done

The > should be a ‘>’ (greater-than) character, but my syntax highlighter doesn’t like wedges. The mangled ifs above actually end with "* >/dev/null 2>&1; then"!

This will check each user’s home directory for a Maildir, and if the Maildir contains subdirectories named Train.Spam or Train.Ham (which appear in an IMAP client as a directory tree), all messages in there will be used to train SpamAssassin’s Bayesian classifier. The messages will then be moved to .Trash.

Above script blindly assumes that you have a .Trash directory and that SpamAssassin is using a global database. Also, it only makes sense if you’re using IMAP because otherwise, you won’t be able to move emails around in folders on the server.