Date: Sat, 19 May 2007 09:24:07 -0400
From: "Gordon V. Cormack" <gvcormac@uwaterloo.ca>
To: trecspam@nist.gov
Subject: TREC 2007 Spam Track

We are still in the process of finalizing the TREC 2007 Spam Track
guidelines.  This is an interim report.  We should have the final
guidelines by the end of June.  In the meantime, please feel 
free to post queries or comments to this list.

The primary tasks will use the same tool kit and data format as
last year; i.e. the TREC spam filter evaluation toolkit available
here:

    http://plg.uwaterloo.ca/~gvcormac/spam/

The active learning task will use a different version of the 
toolkit, to be available shortly.

The three tasks are:

    On-line filtering with immediate feedback.  
        - exactly the same task as for TREC 2005 and 2006

    On-line filtering with delayed/incomplete/noisy feedback.
        - same tools as for the TREC 2006 delayted feedback,
          but the test data may not contain "train" commands
          for every message, and some of the "train" commands
          may be wrong so as to simulate user underreporting
          and user error.

          Emphasis will be placed on correct classification
          of a large number of messages with no feedback.
          (For example, there may be no feedback for the
          last half of the messages)

    On-line active filtering with active learning.
        - different tools and task from TREC 2006 active learning
          Filters will perform on-line classification as for the
          other two tasks, but will be allowed to query the
          true class (ham or spam) of a fraction of the messages,
          chosen by the filter.  This will be effected by an
          additional command "query" added to the toolkit:

               query <message>

          where <message> is a message previously classified
          by the filter.   


Filters will be submitted to NIST (date to be determined, but
probably mid-July) after which a public corpus will be released.
Results on the public corpus will also be submitted to NIST
(probably late August).
 
-- 
Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
gvcormack@uwaterloo.ca            http://cormack.uwaterloo.ca/cormack