TREC 2006 Spam Evaluation Kit

This is an updated version of the Spam Evaluation Jig.
The old version (and some background documentation) is here.

Summary:

   - Linux/Unix/Cygwin commands to run and evaluate filters

Updates:

   - supports delayed/incomplete training
   - "active learning" shell
   - More evaluation output
   - ROC learning curves
   - doesn't require System R installation
   - "Makefile" so you can build the system easily

Downloads:

   - the kit itself, including SpamAssassin Corpus and Bogofilter

   - a delayed-training update for trec05p-1, the TREC 2005 Public Corpus

   - active.cpp  --  TREC 2006 ONLY active learning shell
                 -- compile using:
                       g++ -o active active.cpp
                 -- then use in place of "run.sh"
                       ./active corpusname runname resultfile
                 -- extract a particular segment of resultfile; e.g.
                       grep teach=200 resultfile > resultfile.200

                 -- implements simple random selection; modify
                     it to do something more interesting.

   - run.activeLearning.cpp -- TREC 2007 active learning shell
                 -- compile using:
                       g++ -o active active.cpp
                 -- then use in place of "run.sh"
                       ./active corpusname runname resultfile quota

                 -- usage details:
                       http://plg.uwaterloo.ca/~gvcormac/spam/onlineActiveIntro.html