Index of /~gvcormac/jig

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[   ]spamfilterjig-full-1.2.tar.gz2006-06-20 08:25 11M 
[   ]spamfilterjig-nocorpus-1.2.tar.gz2006-06-20 08:25 864K 
[   ]trec05p-1-delay.tar.gz2006-05-29 20:47 504K 
[   ]littleindex.tgz2006-07-05 12:54 11K 
[TXT]run.activeLearning.cpp2007-06-02 10:49 4.4K 
[TXT]active.cpp2006-06-20 08:27 2.0K 
[TXT]README.html2007-06-02 10:59 1.7K 
[DIR]old/2006-06-20 08:25 -  
[DIR]chinese/2006-06-08 09:41 -  

TREC 2006 Spam Evaluation Kit

This is an updated version of the Spam Evaluation Jig.
The old version (and some background documentation) is here.

Summary:

   - Linux/Unix/Cygwin commands to run and evaluate filters

Updates:

   - supports delayed/incomplete training
   - "active learning" shell
   - More evaluation output
   - ROC learning curves
   - doesn't require System R installation
   - "Makefile" so you can build the system easily

Downloads:

   - the kit itself, including SpamAssassin Corpus and Bogofilter

   - a delayed-training update for trec05p-1, the TREC 2005 Public Corpus

   - active.cpp  --  TREC 2006 ONLY active learning shell
                 -- compile using:
                       g++ -o active active.cpp
                 -- then use in place of "run.sh"
                       ./active corpusname runname resultfile
                 -- extract a particular segment of resultfile; e.g.
                       grep teach=200 resultfile > resultfile.200

                 -- implements simple random selection; modify
                     it to do something more interesting.

   - run.activeLearning.cpp -- TREC 2007 active learning shell
                 -- compile using:
                       g++ -o active active.cpp
                 -- then use in place of "run.sh"
                       ./active corpusname runname resultfile quota

                 -- usage details:
                       http://plg.uwaterloo.ca/~gvcormac/spam/onlineActiveIntro.html