TREC 2007 Online Active Learning Task This year, we are including an online active learning task. In this setting, ham/spam labels for training are only available to the filter upon request, and there is a fixed allowance of requests allowed for the given run. The goal is for the filter to classify effectively with a limited amount of ground truth labeling. In this scenario, as the filter classifies a message it also determines whether or not to make a label request for that message. Active learning is implemented by a new version of the "run.cpp" program supplied with the toolkit. A beta release of this program is available at plg.uwaterloo.ca/~gvcormac/spam/run.activeLearning.cpp The only modifications to previous filter interface is that the ./classify command is now called by: ./classify [filename] [remaining label allowance] [remaining messages] The new parameter 'remaining label allowance' tells the classifier how many more label requests it is entitled to make. The parameter 'remaining messages' tells the classifier the number of messages remaining to be classified (including the current message). The ./classify command should output: class=[classification] score=[score] tfile=[tfile] labelReq=[label request] The filter's three label request options are: noRequest -- Makes no label request. labelN -- Requests a label. If no label is available (due to exhausted allowance), then no training is performed. labelB -- Requests a label. If no label is available (due to exhausted allowance), then bootstrap training is performed using the filter's prediction as the 'true' label for training. A naive solution to this problem would be to have the filter make a labelN request for every message. This would request labels and train normally for the first N messages, where N is the label allowance, and then would not update for the remainder of the run. The testing jig is backward compatible with filters from prior years by making the naive approach (all labelN) the default method if no label request is specified. This allows prior filters to run on this task without modification. Tests will be performed with several different label allowances, such as N={100, 500, 1000, 5000, 10000, all}.