From trecspam@nist.gov Mon May 29 23:28:09 2006 Subject: Active Learning Task I have added "active.cpp" to the Evaluation Toolkit web page: http://plg.uwaterloo.ca/~gvcormac/jig. active.cpp is an active learning shell that replaces run.sh in the jig. That is, you compile and run it instead of run.sh. It uses the filter through the normal filter interface: initialize, train, classify, and finalize. Therefore, the shell may be used with any TREC filter -- yours or one of the standard ones. active.cpp selects a sequence of messages from the first 90% of the corpus as "teach me" examples. That is, the filter is trained on the correct classification for these, and only these, examples. active.cpp gets to choose which, and in what order, to train the filter on the "teach me" examples. I have implementedm something really dumb -- random selection -- which you can no doubt improve on by modifying or replacing active.cpp From time to time (after 100, 200, 400, etc. "teach me" examples) the filter is asked to classify the remaining 10% of the corpus, one-at-a-time, in order. In the end the output file will contain the concatenated results of all these classification attempts. Each line of each attempt will contain "teach=nnn" -- the number of "teach me" examples prior to the classification run. You can therefore separate the file into separate ones for evaluation purposes using "grep". To submit your active learning entry, include your own version of "active.cpp" and any other filter components you require in a tar file and submit it in the same was as your filter. Once the public data is released, run your entry on the data and submit the result file. Note: although the corpus contains the true label for every message, only the labels of the selected "teach me" examples should ever be used. This is a pilot exercise, so please let me know if you uncover any problems.