These guidelines expand on the Initial
Draft Guidelines of May 3, 2015. Questions concerning
the Total Recall Track may be directed to, and will be answered
in, the Total
Recall Discussion Group.
The At-Home task will be divided into three "tests" (as defined in the guidelines) named athome1, athome2, and athome3. Each of the At-Home tests has ten topics and a single dataset, for a total of 30 topics. The datasets for the tests, respectively, contain 290,000 documents, 450,000 documents, and 900,000 documents.
Participants may conduct one or more full experiments, each of
which consists of three runs -- one for each of athome1, athome2,
and athome3 tests. Alternatively, groups with limited
resources may conduct limited experiments, each consisting of a
single run of athome1 test.
Access to the At-Home datasets, topics, and relevance assessments will be through the Assessment Server, which has been available, and will continue to be available throughout the At-Home task.
NOTE: the Assessment Server will be down on Sunday, June 28, in order to configure it with the At-Home tests.
Each participant may conduct up to six automatic full or limited
experiments, with each experiment applying a particular fully
automated method to test athome1, and, in the case of a full
experiment, also to tests athome2 and athome3. Participants should
use a meaningful name (of their own choice) for each experiment,
and enter that name as the "RUNNAME" in the Baseline
Model Implementation ("BMI") configuration file, or as the
":alias" parameter in the API, or as "Run
Alias" when using the manual Web interface.
NOTE: once an athome test run is created, its results will become part of the official TREC record. It is not possible to start over or to expunge a run.
Automatic experiments may interact with the assessment server either directly using its API, or using the code provided in the BMI.
Participants must certify that, for each automatic experiment:
In other words, automatic experiments must use software that, without human intervention, downloads the dataset and conducts the task end to end.
Each group may conduct one Manual At-Home experiment (whether or not they also conduct Automatic At-Home experiments). Participants conducting both manual and automatic experiments must ensure that the software to conduct their automatic experiments is frozen prior to creating any manual run.
Participants are required to track the nature and quantity of manual effort, and to submit this information before the end of the At-Home phase.
The coordinators envision that manual participants may engage in some or all of the following activities:
Participants are asked to report the general nature of these activities, to estimate the number of hours spent, on average, per topic, and to report the number of documents reviewed, per topic.
NOTE: manual participants, whether or not they manually review documents, may still avail themselves of assessments through the assessment server, using the TREC-supplied "Manual" interface, or using the API or BMI.
Each participant will be assigned an extended Group ID, which must be activated in order to conduct At-Home experiments. The GroupID will have the form GGG.XXXX where GGG is the GroupID used for practice, and XXXX is a randomly generated suffix.
To gain access to athome1, participants must sign the "TREC Total Recall Usage Agreement" and return a pdf of the signed agreement to the TREC Total Recall coordinators.
To gain access to athome2 and athome3, participants must submit the "TREC Dynamic Domain Usage Agreement," to NIST and forward the email confirming NIST's acceptance of that agreement to the Total Recall coordinators.
NOTE: Participants do not need to download
the Dynamic Domain datasets to participate in Total Recall; but
they do need to obtain permission, as some of the Total Recall
datasets are derived from the Dynamic Domain datasets. These
derivative datasets will be supplied automatically to authorized
participants by the Total Recall server.
For each experiment, participants will be required to respond to
a questionnaire, containing these questions:
Sandbox submissions will be run by the TREC Total Recall coordinators (or their delegates) on private datasets. One of the datasets that will be used consists of 400,000 email messages from the administration of a senior elected official, which have previously been classified according to statutory criteria by a professional archivist.
Other datasets and topics include test collections that are
available to, but cannot be disseminated by, the TREC Total Recall
coordinators.
Further details on Sandbox submission requirements will be
available prior to the Sandbox submission deadline of September 1.
Once participants have completed their experiments, there will be a facility for them to download a log of their submissions, as well as the official relevance assessments. Tools that compute various summary evaluation results will be provided. Participants may use this information to conduct unofficial experiments exploring "what if?" scenarios.
The TREC 2015 Total Recall Track will report a number of evaluation measures that reflect how nearly all of the relevant documents are found (i.e., completeness), as a function of the number of documents submitted to the assessment server (i.e., effort).
Generally, these measures may be grouped into "Rank measures" and "Set measures." Rank measures reflect completeness for various effort values; for example, as a gain curve, or as a summary measure such as "effort to achieve 80% recall."
Set measures, on the other hand, reflect completeness and effort at a fixed level of effort, specified by the participant's system during the run. Three such fixed levels of effort will be used: "70% (estimated) recall," "80% (estimated) recall," and "best effort." Participating runs use the "call your shot" interface to specify the point at which they estimate, respectively:
Participants who do not use the "call your shot" interface will receive no score for the set-based measures, but will be scored according to the rank-based measures.
A number of different "completeness" measures will be reported. The most obvious measure of completeness is total recall -- the fraction of relevant documents that have been submitted to the assessment server.
To mitigate known shortcomings of recall, facet-based recall will also be reported, for various facets. A facet is an identifiable subpopulation of documents; for example, highly relevant documents, documents reflecting a particular subtopic, documents of a particular type, documents from a particular subcollection, etc.
The objective is to achieve a high level of completeness on each facet, regardless of how the facets may be defined.
Participants who conduct at least one experiment (At-Home automatic, At-Home manual, or Sandbox) are eligible to attend the TREC workshop in November, to have a paper included in the TREC workbook, and to have a paper included in the final TREC proceedings. Participants may present a poster at TREC, and may be invited to speak.