SPAM Track Guidelines - TREC 2005 - 2007

Gordon Cormack (gvcormac@uwaterloo.ca)
Tom Lynam (trlynam@uwaterloo.ca)
Last revised July 16, 2007

CEAS 2008 Challenge Lab Evaluation Corpus available to the public

TREC 2007 Public Corpus available to the public

TREC 2007 Spam (and Email) Track Guidelines

Overview

TREC 2007 Submission Deadlines

May 19, 2007 - Interim message describing TREC 2007 Tasks

Please subscribe to the mailing list.

TREC 2006 Spam (and Email) Track Guidelines

The 2006 track will reprise the 2005 experiments with new filters and data, and will also investigate delayed feedback and active learning.

There are two tasks:

Sign up for the mailing list to participate in shaping TREC 2006.

Important notes from the mailing list

Due dates:

TREC 2005 Spam Track Overview & Results

May 14, 2005: Final guidelines

The deadlines and tasks are now finalized. We are in the process of preparing a revised document, but there will be no material changes from the description that is found here.

Summary for Participants

May 2, 2005: Please read memo on test environment and timeline

NOTE: Participants must submit intention to participate in TREC

See Call to TREC 2005. While the official deadline has passed, applications will still be considered at this time.

January 21, 2005

Prototype TREC Spam Filter Evaluation Toolkit is available for download.
Presentation slides and Video presentation from The 2005 Spam Conference.

Summary

An automatic spam filter classifies a chronological sequence of email messages as SPAM or HAM (non-spam). The subject filter is run on several email sequences, some public and some private. The performance of the filter is measured with respect to gold standard judgements by a human assessor.

Objectives

Mailing List

To join the list send a mail message to listproc@nist.gov such that the body consists of the line
subscribe trecspam   
There's an
archive of the list. You should receive the password once you subscribe. There's also a summary and taxonomy of the voluminous discussion that has taken place as of February 25, 2005. You'll find the password for that site in the list archive under the thread "Taxonomy."

The Task

A filter to be evaluated must be packaged so as to implement the following command-line commands, to execute on either Windows XP, Linux, or Solaris, as outlined below. Details are packaged with the Evaluation Toolkit
   initialize
   classify emailfile resultfile
   train ham emailfile resultfile
   train spam emailfile resultfile
   finalize
"Initialize" will install the system and configure it to process a single email sequence.

"Classify" will be called by the evaluation system once for every email message in the sequence. "Classify" must return a result file with three components: judgement ("ham" or "spam"), score (a real number such that a higher number indicates higher likelihood that the message is spam), and system info (up to 1kb of data which will be passed back to the filter, but is otherwise unused by the evaluation system).

"Train ham" and "train spam" communicate the gold standard judgement from the evaluation system to the filter. Each "classify" command will be immediatedly followed by either "train ham" or "train spam" (communicating the gold standard judgement) and the same emailfile and resultfile from the preceding classify command.

"Finalize" will terminate and uninstall the system, removing any processes, files, or settings created by the other commands.

A preliminary implementation of a simple spam filter implementing this interface was provided by the coordinators in early 2005. The interface will be finalized several weeks before the submission deadline.

Testing Procedure

Prior to testing, an assessor will assemble an email sequence, and enter a gold-standard judgement for each message. An automated test jig will run the target filter against the email sequence, using the interface described above. The test jig will produce a raw result file for further analysis. For each email message, in sequence, the raw result file contains:
   unique-identifier filter-judgement gold-standard-judgement filter-score

For TREC 2005, network access will be prohibited. A time limit of approximately 2 seconds per message (average) will be enforced. The largest test runs may be assumed to contain no more than 100,000 messages.

A preliminary implementation of the automated test jig was be provided by the coordinators in early 2005. Sample email sequences and gold-standard judgements suitable for use with the jig are included.



Evaluation

Evaluation measures will be based on those proposed in A Study of Supervised Spam Detection by Cormack and Lynam - http://plg.uwaterloo.ca/~gvcormac/spamcormack .

The primary measures are:

Other measures and methods of failure analysis will be investigated as the track takes shape. The following combined ham/spam misclassification score is under consideration: For the purpose of stratified analysis, at least one test corpus will be classified into genres such as