About the 2005 TREC Public Spam Corpus

The 2005 TREC Public Spam Corpus (trec05p-1) contains 92,189 email messages, with a chronological index labelling each as spam or ham (i.e. legitimate email). 52,790 messages are labelled spam while 39,399 are labelled ham.

The corpus was created for the TREC Spam Evaluation Track using an iterative adjudication process.

The corpus is available for free download subject to a usage agreement.

The TREC 2006 and TREC 2007 corpora are available as well.