About the 2005 TREC Public Spam Corpus
The 2005 TREC Public Spam Corpus (trec05p-1) contains 92,189 email
messages, with a chronological index labelling each as spam
or ham (i.e. legitimate email).
52,790 messages are labelled spam while 39,399 are labelled ham.
The corpus was created for the TREC Spam Evaluation Track
using an iterative adjudication process.
The corpus is available for free download subject to a
usage agreement.
The TREC 2006 and
TREC 2007 corpora are available as well.