TREC 2016 Total Recall Track

Guidelines:  May 31, 2016

Task Overview

The TREC 2016 Total Recall Track shares the same objectives and overall architecture as the TREC 2015 Total Recall Track.  Participants should familiarize themselves with the TREC 2015 Total Recall Track Overview, currently available to teams who have registered with NIST to participate in TREC 2016, using the login credentials for "active participants" supplied by NIST.

The overall task is unchanged from 2015:  Track participants will implement automatic or semi-automatic methods to identify as many relevant documents as possible, with as little review effort as possible, from document collections containing as many as 2.2 million documents.  Participating systems will be run against an automated assessment server, which is unchanged  from 2015 (except for the addition of new datasets).  An open-source (GPL) baseline model implementation (BMI) of an automated participant system is available for download.  Participants may use or modify BMI, or may implement their own system -- automated or manual -- from scratch.

Participants familiar with the 2015 guidelines and supplemental guidelines will find the following differences from the 2015 Track:

Task Operation

The document collection, information need (topic), and an automated relevance assessor will be supplied to participants via an on-line server. After downloading the collection and information need, participants must identify documents from the collection and submit them (in batches whose size is determined by the participant) to the on-line relevance assessor.  Every document submitted to the assessor is scored, and the primary assessment of relevance is returned immediately to the participant, for each document in each batch, as it is submitted.  To accomplish this, the Total Recall coordinators are using collections in which every document has been pre-labeled as relevant or not and the automated assessor merely provides that label to the participant.

Participants have two objectives:

  1. To submit as many documents containing relevant information as possible, while submitting as few documents as possible, to the automated relevance assessor.  Submission continues indefinitely, and is evaluated in terms of how many relevant documents are found, as a function of the number of documents submitted.
  2. To "call their shot" to indicate, without actually stopping, the point at which it would be reasonable to stop, because the effort to review more documents would be disproportionate to the value of any further relevant documents that might be found.

Motivating Applications

The Total Recall Track addresses the needs of searchers who want to find out everything about X, for some X. Typical examples include:

Evaluation Measures

There are many possible definitions for "as many documents containing relevant information as possible" and "as few documents as possible." The Total Recall Track will report traditional as well as novel measures to weigh the tradeoff between information found and effort expended.

Rank-based measures will include recall-precision curves, gain curves, and recall evaluated at aR+b documents submitted, for all combinations of a = {1, 2, 4} and b = {0, 100, 1000}.

Set-based measures measures, evaluated at the point at which "call your shot" is indicated, will include Recall and Precision, as well as aggregate measures like F1 and other utility measures, to be announced, that balance recall with review effort.

Measures taking into account the importance, sub-topic coverage, and alternate assessments will also be computed.

Summary recall/precision/effort results will be available to participants at the end of the evaluation period, and detailed results will be presented to participants at TREC in November, and to the public in the TREC proceedings, to be published in early 2017.

"At Home" vs. "Sandbox" Evaluation

Practice collections and topics are available now to all registered TREC 2016 participants. A baseline model implementation (BMI) of a fully automated approach is available now for experimental purposes.  TREC participants may access the test collections immediately, via the server, using their registered TREC Group Identifier; Participants may also access access the TREC 2015 collections athome1, athome2, and athome3 by submitting the necessary data access agreements (See below).

For the 2016 "At Home" task, one new collection, athome4, will be available to participants via the Internet, subject to the execution of the TREC 2016 Total Recall usage agreement.  For this collection, participants will run their own systems, and access the automated assessor via the Internet.  No prior experimentation or practice on athome4 is permitted; all runs will be logged and reported in the TREC 2016 proceedings.

Participants must declare each run to be either "automatic," meaning that no manual intervention was used once the collection was downloaded, or "manual," meaning that manual intervention -- whether parameter tweaking, searching, or full-scale document review -- was involved.  If multiple runs are conducted, every run must be independent; under no circumstances may information learned from one run be used in any other. If documents are manually reviewed, the same documents must also be submitted to the assessment server, at the time they are reviewed.  At Home participants will be required to complete a short questionnaire describing the nature and quantity of the manual effort involved in each run.

For the "Sandbox" task, the server for the two collections will be available only within a firewalled platform with no Internet access.  Participants wishing to evaluate their systems on these datasets must submit a fully automated solution, which the Track coordinators will execute as a virtual machine within a restricted environment.

The baseline model implementation (BMI) supplied by the 2015 Total Recall Track is suitable for "Sandbox" as well as automatic "At Home"participation, and participants are free to modify it as they see fit, subject to the GNU Public License (GPL v.3).

Participants may submit their own virtual machine, perhaps containing proprietary software.  In this case, participants must warrant that they have the right to use the software in this way, and the Track coordinators will in turn warrant that the submission will be used only for the purpose of evaluation within the sandbox.

Potential Strategies

Anticipated Logistics and Timeline

Details: Automatic At-Home Participation

Each participant may conduct up to six automatic full or limited experiments, with each experiment applying a particular fully automated method to test athome4 (or athome4subset).  Participants should use a meaningful name (of their own choosing) for each experiment, and enter that name as the "RUNNAME" in the Baseline Model Implementation (BMI) configuration file, or as the ":alias" parameter in the API, or as "Run Alias" when using the manual Web interface.

NOTE:  Once an athome test run is created, its results will become part of the official TREC 2016 record.  It is not possible to start over or to expunge a run.

Automatic experiments may interact with the assessment server either directly using its API, or using the code provided in the BMI.

Participants must certify that, for each automatic experiment:

  1. No modification or configuration of participants' system was done after a run (either automatic or manual) was created for any athome test; and
  2. No modification or configuration of participants' system was done after any member of the participant group became aware of the topics or contents of documents used for either the Total Recall or Dynamic Domain tasks.

In other words, automatic experiments must use software that, without human intervention, downloads the dataset and conducts the task end to end.

Details: Call your Shot

If you download BMI on or after May 24, 2016, it will include the default "call your shot" whose implementation is shown below.  You may modify the call-your-shot rule (and any other aspect of BMI) as you wish.

To implement the default rule for "call your shot" in a version of BMI downloaded prior to May 24, 2016, please modify the BMI implementation as follows:

                 NDUN=$((NDUN+L))
  L=$((L+(L+9)/10))
                 if [ "$TOPIC" != "$REASONABLE" -a "`sort new*.$TOPIC | join -v1 - prel.$TOPIC | wc -l`" -ge $((1000+`cat prel.$TOPIC | wc -l`/2)) ] ; then
            curl -X POST "$TRSERVER/judge/shot/$LOGIN/$TOPIC/reasonable"
            echo "Called shot REASONABLE for topic $TOPIC at $NDUN" >> $LOG.$LOGIN
            REASONABLE="$TOPIC"
      fi

Details: Manual At-Home Participation

Each group may conduct one Manual At-Home experiment (whether or not they also conduct Automatic At-Home experiments).  Participants conducting both manual and automatic experiments must ensure that the software to conduct their automatic experiments is frozen prior to creating any manual run.

Participants are required to track the nature and quantity of any manual effort, and to submit this information before the end of the At-Home phase.

The coordinators envision that manual participants may engage in some or all of the following activities:

  1. Dataset-specific processing, formatting, or indexing;
  2. Topic-specific searching within the dataset;
  3. Consultation of external resources such as the Web, or individuals familiar with the subject matter of the topics;
  4. Manual review of documents.

Participants are required to report the nature of these activities, to estimate the number of hours spent, on average, per topic, and to report the number of documents reviewed, per topic.  Participants are required to submit all manually reviewed documents to the assessment server, so that they may be accounted for as "review effort."

NOTE: Mmanual participants, whether or not they manually review documents, may still avail themselves of assessments through the assessment server, using the TREC-supplied "Manual" interface, or using the API or BMI.

At-Home Usage Agreements, and GroupID activation

Each participant will be assigned an extended Group ID, which must be activated in order to conduct At-Home experiments.  The GroupID will have the form GGG.XXXX where GGG is the GroupID used for practice, and XXXX is a randomly generated suffix.

To gain access to athome1 (for testing purposes) and athome4 or athome4subset (for submission), participants must sign the "TREC Total Recall Usage Agreement" and return a pdf of the signed agreement to the TREC Total Recall coordinators.

To gain access to athome2 and athome3 (for testing purposes), participants must submit the "TREC Dynamic Domain Usage Agreement," to NIST and forward the email confirming NIST's acceptance of that agreement to the Total Recall coordinators.

NOTE:  Participants do not need to download the Dynamic Domain datasets to participate in the Total Recall 2016 Track; but if they want to use them for testing purposes, they need to obtain permission.

Participant Questionnaire

For each experiment, participants will be required to respond to a questionnaire containing questions such as the following:

  1. Assigned TREC Group ID
  2. Name of experiment [Each experiment should have a separate name, which should be used consistently as "RUNNAME" for BMI users, ":alias" for API users, or "Run Alias" for manual Web interface users; A participating group may submit up to six automatic experiments, and one manual experiment.]
  3. Manual or Automatic? 
  4. Please give a brief description of the hypothesis and methods employed in this experiment, with particular emphasis on how it differs from other experiments for which you are submitting results.

Sandbox Datasets and Topics

Sandbox submissions will be run by the TREC Total Recall coordinators (or their delegates) on private datasets.  One of the datasets that will be used consists of 2.2M email messages from the administrations of two senior elected officials, which have previously been classified according to six topics of interest, not unlike the athome4 collections.

The second dataset will consist of 800,000 Twitter "tweets," classified according to four topics of interest.

Further details on Sandbox submission requirements will be available prior to the Sandbox submission deadline of September 7, 2016.

Unofficial Runs

Once participants have completed their experiments, there will be a facility for them to download a log of their submissions, as well as the official relevance assessments.  Tools that compute various summary evaluation results will be provided.  Participants may use this information to conduct unofficial experiments exploring "what if?" scenarios.

The TREC 2016 Workshop

Participants who conduct at least one experiment (At-Home automatic, At-Home manual, or Sandbox) are eligible to attend the TREC 2016 workshop in November, to have a paper included in the TREC 2016 workbook, and to have a paper included in the final TREC 2016 proceedings.  Participants may also present a poster at TREC, and may be invited to speak.