The TREC 2016 Total Recall Track shares the same objectives and
overall architecture as the TREC 2015 Total Recall Track.
Participants should familiarize themselves with the TREC
2015 Total Recall Track Overview, currently available to
teams who have registered
with NIST to participate in TREC 2016, using the login
credentials for "active participants" supplied by NIST.
The overall task is unchanged from 2015: Track participants
will implement automatic or semi-automatic methods to identify as
many relevant documents as possible, with as little review effort
as possible, from document collections containing as many as 2.2
million documents. Participating systems will be run against
an automated assessment server, which is unchanged from 2015
(except for the addition of new datasets). An open-source
(GPL) baseline model implementation (BMI) of an automated
participant system is available for
download. Participants may use or modify BMI, or may
implement their own system -- automated or manual -- from scratch.
Participants familiar with the 2015
guidelines and supplemental
guidelines will find the following differences from the
2015 Track:
Reviewed ≥ 1.5 Relevant_Reviewed + 1000,where Reviewed is the total number of documents submitted to the assessment server, and Relevant_Reviewed is the number of such documents that are labeled relevant by the primary assessor.
The document collection, information need (topic), and an
automated relevance assessor will be supplied to participants via
an on-line server. After downloading the collection and
information need, participants must identify documents from the
collection and submit them (in batches whose size is determined by
the participant) to the on-line relevance assessor. Every
document submitted to the assessor is scored, and the primary
assessment of relevance is returned immediately to the
participant, for each document in each batch, as it is
submitted. To accomplish this, the Total Recall coordinators
are using collections in which every document has been pre-labeled
as relevant or not and the automated assessor merely provides that
label to the participant.
Participants have two objectives:
Set-based measures measures, evaluated at the point at which
"call your shot" is indicated, will include Recall and Precision,
as well as aggregate measures like F1
and other utility measures, to be announced, that balance recall
with review effort.
Measures taking into account the importance, sub-topic coverage,
and alternate assessments will also be computed.
Summary recall/precision/effort results will be available to participants at the end of the evaluation period, and detailed results will be presented to participants at TREC in November, and to the public in the TREC proceedings, to be published in early 2017.
For the 2016 "At Home" task, one new collection, athome4, will be available to participants via the Internet, subject to the execution of the TREC 2016 Total Recall usage agreement. For this collection, participants will run their own systems, and access the automated assessor via the Internet. No prior experimentation or practice on athome4 is permitted; all runs will be logged and reported in the TREC 2016 proceedings.
Participants must declare each run to be either "automatic,"
meaning that no manual intervention was used once the collection
was downloaded, or "manual," meaning that manual intervention --
whether parameter tweaking, searching, or full-scale document
review -- was involved. If multiple runs are conducted,
every run must be independent; under no circumstances may
information learned from one run be used in any other. If
documents are manually reviewed, the same documents must also be
submitted to the assessment server, at the time they are
reviewed. At Home participants will be required to complete
a short questionnaire describing the nature and quantity of the
manual effort involved in each run.
For the "Sandbox" task, the server for the two collections will be available only within a firewalled platform with no Internet access. Participants wishing to evaluate their systems on these datasets must submit a fully automated solution, which the Track coordinators will execute as a virtual machine within a restricted environment.
The baseline model implementation (BMI) supplied by the 2015 Total Recall Track is suitable for "Sandbox" as well as automatic "At Home"participation, and participants are free to modify it as they see fit, subject to the GNU Public License (GPL v.3).
Participants may submit their own virtual machine, perhaps containing proprietary software. In this case, participants must warrant that they have the right to use the software in this way, and the Track coordinators will in turn warrant that the submission will be used only for the purpose of evaluation within the sandbox.
Each participant may conduct up to six automatic full or limited
experiments, with each experiment applying a particular fully
automated method to test athome4 (or athome4subset).
Participants should use a meaningful name (of their own choosing)
for each experiment, and enter that name as the "RUNNAME" in the Baseline
Model Implementation (BMI) configuration file, or as the
":alias" parameter in the API, or as "Run
Alias" when using the manual Web interface.
NOTE: Once an athome test run is created, its results will become part of the official TREC 2016 record. It is not possible to start over or to expunge a run.
Automatic experiments may interact with the assessment server either directly using its API, or using the code provided in the BMI.
Participants must certify that, for each automatic experiment:
In other words, automatic experiments must use software that,
without human intervention, downloads the dataset and conducts the
task end to end.
If you download BMI on or after May 24, 2016, it will include the
default "call your shot" whose implementation is shown
below. You may modify the call-your-shot rule (and any other
aspect of BMI) as you wish.
To implement the default rule for "call your shot" in a version
of BMI downloaded prior to May 24, 2016, please modify the BMI
implementation as follows:
NDUN=$((NDUN+L))
L=$((L+(L+9)/10))
if [ "$TOPIC" != "$REASONABLE" -a "`sort new*.$TOPIC | join -v1 - prel.$TOPIC | wc -l`" -ge $((1000+`cat prel.$TOPIC | wc -l`/2)) ] ; then
curl -X POST "$TRSERVER/judge/shot/$LOGIN/$TOPIC/reasonable"
echo "Called shot REASONABLE for topic $TOPIC at $NDUN" >> $LOG.$LOGIN
REASONABLE="$TOPIC"
fi
Each group may conduct one Manual At-Home experiment (whether or not they also conduct Automatic At-Home experiments). Participants conducting both manual and automatic experiments must ensure that the software to conduct their automatic experiments is frozen prior to creating any manual run.
Participants are required to track the nature and quantity of any manual effort, and to submit this information before the end of the At-Home phase.
The coordinators envision that manual participants may engage in some or all of the following activities:
Participants are required to report the nature of these
activities, to estimate the number of hours spent, on average, per
topic, and to report the number of documents reviewed, per
topic. Participants are required to submit all manually
reviewed documents to the assessment server, so that they may be
accounted for as "review effort."
NOTE: Mmanual participants, whether or not they manually review documents, may still avail themselves of assessments through the assessment server, using the TREC-supplied "Manual" interface, or using the API or BMI.
Each participant will be assigned an extended Group ID, which must be activated in order to conduct At-Home experiments. The GroupID will have the form GGG.XXXX where GGG is the GroupID used for practice, and XXXX is a randomly generated suffix.
To gain access to athome1 (for testing purposes) and athome4 or athome4subset (for submission), participants must sign the "TREC Total Recall Usage Agreement" and return a pdf of the signed agreement to the TREC Total Recall coordinators.
To gain access to athome2 and athome3 (for testing purposes), participants must submit the "TREC Dynamic Domain Usage Agreement," to NIST and forward the email confirming NIST's acceptance of that agreement to the Total Recall coordinators.
NOTE: Participants do not need to
download the Dynamic Domain datasets to participate in the Total
Recall 2016 Track; but if they want to use them for testing
purposes, they need to obtain permission.
For each experiment, participants will be required to respond to
a questionnaire containing questions such as the following:
Sandbox submissions will be run by the TREC Total Recall
coordinators (or their delegates) on private datasets. One
of the datasets that will be used consists of 2.2M email messages
from the administrations of two senior elected officials, which
have previously been classified according to six topics of
interest, not unlike the athome4 collections.
The second dataset will consist of 800,000 Twitter "tweets,"
classified according to four topics of interest.
Further details on Sandbox submission requirements will be
available prior to the Sandbox submission deadline of September 7,
2016.
Once participants have completed their experiments, there will be a facility for them to download a log of their submissions, as well as the official relevance assessments. Tools that compute various summary evaluation results will be provided. Participants may use this information to conduct unofficial experiments exploring "what if?" scenarios.
Participants who conduct at least one experiment (At-Home automatic, At-Home manual, or Sandbox) are eligible to attend the TREC 2016 workshop in November, to have a paper included in the TREC 2016 workbook, and to have a paper included in the final TREC 2016 proceedings. Participants may also present a poster at TREC, and may be invited to speak.