View Source: MuttSpamAssassinBayes

You are viewing an old revision of this page. View the current version.
[SpamAssassin|http://spamassassin.sf.net] is the best spam mail filtering tool available.  It uses a series of rules to determine if your mail is spam or not. And of course, [mutt|http://www.mutt.org] is the best mail reader around.  Here's how to combine the two to make spam go away.

Now with the newer versions (2.55 or so), ~SpamAssassin incorporates Bayesian filtering.  This is a method that develops a statistical model of your mail to determine what messages are spam and which are non-spam (ham).  Bayesian filtering is proving to be very powerful, because it learns what is spam and what is ham based on the type of mail you get.

However, Bayesian filtering needs a little manual help to be effective.  ~SpamAssassin provides an automated training mechanism by assigning messages to spam or ham via it's other tests, and then feeding those results into the Bayes filter.  Note that there is a learning period of 200 messages before this takes effect (that is, 200 messages ranked <0 or >15 in the ~SpamAssassin points system).

To make ~SpamAssassin a little smarter, you can track your messages and feed the classifier manually.  The easiest way to do this is to do two things: first, have ~SpamAssassin direct all your spam to a spam mailbox, which is probably what you do already.  Never delete messages from this mailbox (because I'll show you a script below which will do it manually).  Then, replace the delete function in your mailer with one that instead moves the message to a Trash mailbox.  Since you never put spam here (because you never delete it, remember), the net result is that Trash contains all valid messages, and Spam contains all spam.  Obviously, it's a good idea to check the Spam mailbox every once in a while for mistakes.

Now to take this to completion, use a script that checks Trash and Spam periodically and feeds the messages to the ~SpamAssassin bayes filter as ham and spam respectively.  Here's the script I use:

<verbatim>
#!/bin/sh

HAMBOX=$HOME/mail/Trash
SPAMBOX=$HOME/mail/Spam

[ -w $HAMBOX ] || exit 1
[ -w $SPAMBOX ] || exit 1

if sa-learn --mbox --ham $HAMBOX
then
  cat /dev/null > $HAMBOX
fi

if sa-learn --mbox --spam $SPAMBOX
then
  cat /dev/null > $SPAMBOX
fi
</verbatim>
This script feeds Trash as ham and Spam as spam.  Then, it empties both Trash and Spam.  I run this once a month via a cronjob.

Finally, rebind your 'delete' key to make it copy messages to Trash instead.  In mutt, do this with two macros:
<verbatim>
macro index d "s=Trash\n" "move message to trash"
macro pager d "s=Trash\n" "move message to trash"
</verbatim>
Now every time you hit the 'd' key to delete a message, it actually gets moved to Trash.  This has the nice side effect of a safety net for your deleted messages.

If a spam message does slip into your mailbox, simply move it to the spam mailbox amnually, then the script will pick it up on the next run.

There you go, a totally simple way to sort the spam out of your mail.  The longer this runs, the more accurate it will become.
View Source: MuttSpamAssassinBayes

Main Categories

Search

Toolbox

RecentChanges

Favorite Categories

Views