SpamAssassin Personalized Filtering Setup

To whom it may concern:

Here's the setup that I have for Spamassassin under Linux.

It works great.

see A Study of Supervised Spam Detection

I'll give you XXX's config files exactly as they are. YYY or whoever can convert them to something else, like Procmail, if he likes.

If anybody wants to do a better job of "shrink wrapping" this configuration, please do so and let me know.

Gordon Cormack

--
[XXX@hostname XXX]$ pwd
/home/XXX
[XXX@hostname XXX]$ cat .forward
XXX, "| /etc/smrsh/inmail"
[XXX@hostname XXX]$ cat /etc/smrsh/inmail
#!/bin/tcsh
setenv LANG C
umask 077
#setenv HOME /u/XXX
#ssh hostname spamassassin -e >/tmp/dbc$$
#/.software/local/.admin/bins/bin/spamassassin -e >/tmp/dbc$$
#a-learn --rebuild < /dev/null >& /dev/null
spamassassin -e >/tmp/dbc$$
#cat > /tmp/dbc$$
setenv R $?
if ($R == 0) cat /tmp/dbc$$ >> /u/XXX/ham
if ($R == 0) sa-learn --ham --single < /tmp/dbc$$
if ($R != 0) cat /tmp/dbc$$ >> /u/XXX/spam
if ($R != 0) sa-learn --spam --single < /tmp/dbc$$
rm /tmp/dbc$$
[XXX@hostname XXX]$ cat .spamassassin/user_prefs
# SpamAssassin user preferences file.  See 'perldoc Mail::SpamAssassin::Conf'
# for details of what can be tweaked.
###########################################################################

# How many hits before a mail is considered spam.
# required_hits         5

# Whitelist and blacklist addresses are now file-glob-style patterns, so
# "friend@@somewhere.com", "*@@isp.com", or "*.domain.net" will all work.
# whitelist_from        someone@@somewhere.com

# Add your own customised scores for some tests below.  The default scores are
# read from the installed spamassassin rules files, but you can override them
# here.  To see the list of tests and their default scores, go to
# http://spamassassin.org/tests.html .
#
# score SYMBOLIC_TEST_NAME n.nn

# Speakers of Asian languages, like Chinese, Japanese and Korean, will almost
# definitely want to uncomment the following lines.  They will switch off some
# rules that detect 8-bit characters, which commonly trigger on mails using CJK
# character sets, or that assume a western-style charset is in use. 
# 
# score HEADER_8BITS            0
# score HTML_COMMENT_8BITS      0
# score SUBJ_FULL_OF_8BITS      0
# score UPPERCASE_25_50         0
# score UPPERCASE_50_75         0
# score UPPERCASE_75_100        0

score MIME_HTML_ONLY 3


#
# Note: Internal auto_learn is disabled, because it doesn't work
#       properly - that is, it learns using a different judgement
#       from the one it reports!!!  So learning is applied externally
#       (but still automatically) by the script that sends the mail to 
#       Spamassassin.
#
#       It is important for the user to correct misclassifications 
#       using sa-learn.
#
bayes_auto_learn 0

score BAYES_60 2
score BAYES_70 2.6
score BAYES_80 4.2
score BAYES_90 4.3
score BAYES_99 5

bayes_min_ham_num 1
bayes_min_spam_num 1

score RCVD_IN_DSBL 4.3
score RCVD_IN_SBL 1
score RCVD_IN_BONDEDSENDER -5
score HABEAS_SWE 0

[XXX@hostname XXX]$ grep sa-learn .muttrc
macro pager S "| sa-learn --spam --single\r"
macro index S "| sa-learn --spam --single\r"
macro index H "| sa-learn --ham --single\r"
macro pager H "| sa-learn --ham --single\r"