Index of /~gvcormac/jig
TREC 2006 Spam Evaluation Kit
This is an updated version of the Spam Evaluation Jig.
The old version (and some background documentation) is here.
Summary:
- Linux/Unix/Cygwin commands to run and evaluate filters
Updates:
- supports delayed/incomplete training
- "active learning" shell
- More evaluation output
- ROC learning curves
- doesn't require System R installation
- "Makefile" so you can build the system easily
Downloads:
- the kit itself, including SpamAssassin Corpus and Bogofilter
- a delayed-training update for trec05p-1, the TREC 2005 Public Corpus
- active.cpp -- TREC 2006 ONLY active learning shell
-- compile using:
g++ -o active active.cpp
-- then use in place of "run.sh"
./active corpusname runname resultfile
-- extract a particular segment of resultfile; e.g.
grep teach=200 resultfile > resultfile.200
-- implements simple random selection; modify
it to do something more interesting.
- run.activeLearning.cpp -- TREC 2007 active learning shell
-- compile using:
g++ -o active active.cpp
-- then use in place of "run.sh"
./active corpusname runname resultfile quota
-- usage details:
http://plg.uwaterloo.ca/~gvcormac/spam/onlineActiveIntro.html