TREC 2007 Online Active Learning Task

This year, we are including an online active learning task.  In
this setting, ham/spam labels for training are only available 
to the filter upon request, and there is a fixed allowance of
requests allowed for the given run.  The goal is for the filter
to classify effectively with a limited amount of ground truth labeling.

In this scenario, as the filter classifies a message it also 
determines whether or not to make a label request for that message.

Active learning is implemented by a new version of the "run.cpp"
program supplied with the toolkit.  A beta release of this
program is available at

  plg.uwaterloo.ca/~gvcormac/spam/run.activeLearning.cpp

The only modifications to previous filter interface is that the
./classify command is now called by:

./classify [filename] [remaining label allowance] [remaining messages]

The new parameter 'remaining label allowance' tells the
classifier how many more label requests it is entitled to
make.  The parameter 'remaining messages' tells the
classifier the number of messages remaining to be
classified (including the current message).

The ./classify command should output:

class=[classification] score=[score] tfile=[tfile] labelReq=[label request]


The filter's three label request options are:

noRequest -- Makes no label request.

labelN -- Requests a label.  If no label is available (due to exhausted
	allowance), then no training is performed.

labelB -- Requests a label.  If no label is available (due to exhausted
	allowance), then bootstrap training is performed using the
	filter's prediction as the 'true' label for training.


A naive solution to this problem would be to have the filter make
a labelN request for every message.  This would request labels and
train normally for the first N messages, where N is the label allowance,
and then would not update for the remainder of the run.  

The testing jig is backward compatible with filters from prior 
years by making the naive approach (all labelN) the default method 
if no label request is specified.  This allows prior filters to run on 
this task without modification.   

Tests will be performed with several different label allowances,
such as N={100, 500, 1000, 5000, 10000, all}.