TREC Spam Filter Evaluation Tool Kit
[UW]
[CS]
[PLG]
[TREC]
Filter must implement exactly 3 commands
initialize
All steps necessary to install the software on a clean system and to prepare to classify a user's email.
classify filename
Read filename which contains exactly 1 email message
write one line of output:
class=classification score=score tfile=auxiliary_file
train judgement filename classification auxiliary_file
note of gold-standard judgement filename, classification, auxiliary_file from prior classify
Filter Test Jig
Input
User email stream, 1 message per file
Index file, 1 line per message, chronological order:
judgement filename user genre
Filter, as 3 commands: initialize, classify, train
Output
Raw Result File, 1 line per message:
file=filename judge=judgement class=classification score=score user=user genre=genre
Test Jig Implementation
initialize
for each judgement, filename, user, genre in index
classify filename > classification, score, auxiliary_file
train judgement filename classification auxiliary_file
output judgement, filename, classification, score, user, genre
Running
run.sh <corpus_path> <output_file>
corpus_path - Path containing the index. The corpus path will be added to the filename in the index. If none is give the current directory will be used
output_file - output will be written to this file. If none is give the default is "results"
Building the Kit and Running Example Filter
extract the archive
tar -xzf spamfilterjig-full-1.0.tar.gz
change directory to spamfilterjig
cd spamfilterjig-full-1.0
make it for your setup (Linux, Unix, Cygwin/Windows)
change directory to spamfilterjig/example_filter
run the jig (it might take a while to compile and process 6000+ emails)
../run.sh ../spamassassin_corpus/ foo results.bogo
Evaluate the output (system R no longer needed) for full results
cp results.bogo ../eval
cd ../eval
bin/script results.bogo
This will create a file called "results/results.bogo.html"
*New* Delayed training example
The corpus now contains an example of delayed training. To do the
delayed training run:
- Use a different corpus path on the run.sh command:
../run.sh ../spamassassin_corpus/delay/ foodelay results.bogo.delay