From Mon May 29 23:28:09 2006

Subject: Active Learning Task

I have added "active.cpp" to the Evaluation Toolkit
web page:

active.cpp is an active learning shell that replaces in the jig.  That is, you compile and run it
instead of  It uses the filter through
the normal filter interface: initialize, train,
classify, and finalize.  Therefore, the shell may
be used with any TREC filter -- yours or one of
the standard ones.

active.cpp selects a sequence of messages from the
first 90% of the corpus as "teach me" examples.
That is, the filter is trained on the correct
classification for these, and only these, examples.

active.cpp gets to choose which, and in what order,
to train the filter on the "teach me" examples.  I
have implementedm something really dumb -- random
selection -- which you can no doubt improve on by
modifying or replacing active.cpp

 From time to time (after 100, 200, 400, etc.
"teach me" examples) the filter is asked to
classify the remaining 10% of the corpus,
one-at-a-time, in order. In the end the output
file will contain the concatenated results of all
these classification attempts.  Each line of
each attempt will contain "teach=nnn" -- the
number of "teach me" examples prior to the
classification run.  You can therefore separate
the file into separate ones for evaluation
purposes using "grep".

To submit your active learning entry, include
your own version of "active.cpp" and any other
filter components you require in a tar file and
submit it in the same was as your filter.

Once the public data is released, run your entry
on the data and submit the result file.

Note:  although the corpus contains the true label
for every message, only the labels of the selected
"teach me" examples should ever be used.

This is a pilot exercise, so please let me know if
you uncover any problems.