TREC Spam Filter Evaluation Tool Kit

[UW] [CS] [PLG] [TREC]

Filter must implement exactly 3 commands

  • initialize
  • classify filename
  • train judgement filename classification auxiliary_file
  • Filter Test Jig

  • Input
  • Output
  • Test Jig Implementation

    Running

  • run.sh <corpus_path> <output_file>

    Building the Kit and Running Example Filter

  • extract the archive
  • change directory to spamfilterjig
  • make it for your setup (Linux, Unix, Cygwin/Windows)
  • change directory to spamfilterjig/example_filter
  • run the jig (it might take a while to compile and process 6000+ emails)
  • Evaluate the output (system R no longer needed) for full results
  • *New* Delayed training example

    The corpus now contains an example of delayed training. To do the delayed training run: