TREC 2005 Terabyte Track Guidelines


Timetable

       Documents available:                now
       Efficiency topics released:         July 1, 2005
       Efficiency results due at NIST:     July 8, 2005 (11:59pm EDT)
       Adhoc topics released:              July 11, 2005
       Named page finding topics released: July 18, 2005
       Adhoc results due at NIST:          August 1, 2005 (11:59pm EDT)
       Named page finding results due:     August 8, 2005 (11:59pm EDT)
       Conference notebook papers due:     late October, 2005
       TREC 2005 conference:               November 15-18, 2005

Overview

The primary goal of the Terabyte Track is to develop an evaluation methodology for terabyte-scale document collections. In addition, we are interested in efficiency and scalability issues, which can be studied more easily in the context of a larger collection. Again this year, we are using a 426GB collection of Web data from the gov domain for all tasks. While this collection is less than a full terabyte in size, it is considerably larger than the collections used in previous TREC tracks. In future years, we hope to expand the collection using data from other sources.

Again this year, the main track task is classic adhoc retrieval. All participants are expected to submit at least one run for this task. In addition, there are two optional tasks: an efficiency task and a named page finding task.

Collection

All tasks in this year's track will use a collection of Web data crawled from Web sites in the gov domain during early 2004. This collection ("GOV2") contains a large proportion of the crawlable pages in gov, including html and text, plus the extracted text of pdf, word and postscript files. The collection is 426GB in size and contains 25 million documents.

For TREC 2004, the collection was distributed by CSIRO in Australia. For TREC 2005 and forward, this collection is available from the University of Glasgow. The collection has not changed in any way. If you participated in the track last year, and obtained a copy of the GOV2 collection from CSIRO, you do not need to obtain a new copy of the collection from the University of Glasgow.

Topics and Queries

New topics for all three tasks will be released by NIST according to the timetable above. For the main adhoc task, queries may be created automatically or manually from these topic statements. For the efficiency and named page finding tasks, queries must be created automatically. Automatic methods are those in which there is no human intervention at any stage, and manual methods are everything else.

Adhoc Task

An adhoc task in TREC investigates the performance of systems that search a static set of documents using previously-unseen topics. For each topic, participants create a query and submit a ranking of the top documents for that topic (10,000 for this task). NIST will create and assess new 50 topics for the task. Last year's topics and relevance judgments may be used for training.

For most runs, you may use any or all of the topic fields when creating queries from the topic statements. For this task only, you may submit both automatic and manual runs. Each group submitting any automatic run must submit an automatic run that uses just the title field of the topic statement. Manual runs are strongly encouraged, since these runs often add relevant documents to the evaluation pool that are not found by automatic systems using current technology.

An experimental run consists of the top 10,000 documents for each topic. Groups may submit up to four runs for the adhoc task. At least one run will be judged by NIST assessors; NIST may judge more than one run per group depending upon available assessor time. During the submission process you will be asked to rank your submissions in the order that you want them judged. If you give conflicting rankings across your set of runs, NIST will choose the run to assess arbitrarily. The judgments will be on a three-way scale of "not relevant", "relevant", and "highly relevant".

The format for submissions is given in a separate section below. Each topic must have at least one document retrieved for it. Provided you have at least one document, you may return fewer than 10,000 documents for a topic, though note that the standard evaluation measures used in TREC count empty ranks as not relevant. You cannot hurt your score, and could conceivably improve it for these measures, by returning 10,000 documents per topic.

In addition to the top 10,000 documents, we will be collecting information about each system and each run, including hardware characteristics and performance measurements, including total query processing time. Details are given in a separate section below. Be sure to record the required information when you generate your experimental runs, since it will be requested on the submission form.

For query processing time, report the time to return the top 20 documents, not the time to return the top 10,000. It is acceptable to execute your system twice for each query, once to generate the top 10,000 documents and once to measure the execution time for the top 20, provided that the top 20 results are the same in both cases.

Efficiency Task

The efficiency task extends the adhoc task. It is intended to provide a vehicle for discussing and comparing efficiency and scalability issues in IR systems by using slightly better methodology to determine query processing times. Since the hardware used by each group may vary from desktop PCs to supercomputers, invalidating any direct comparison between groups, participants are encouraged to use their runs to compare techniques within their own systems or to compare the performance of their systems to that of public domain systems.

Ten days before the new topics are released for the adhoc task, we will release a large set of efficiency test topics (10,000 to 50,000), which have been mined from query logs of an operational search engine. The title fields from the new adhoc topics will be seeded into this topic set, but will not be distinguished in any way. Queries must be created automatically from these topics; manual runs are not permitted for this task.

Participants will execute the entire topic set, reporting the top-20 results for each query and the total query processing time for the full set. Query processing time includes reading the topics and writing the final submission file. Topics should be processed sequentially, in the order they appear in the topic file. To measure effectiveness, we will extract the results corresponding to the new adhoc topics and add these into the evaluation pool for the adhoc task.

Groups may submit up to four runs. At least one run will be judged by NIST assessors. Each experimental run consists of the top 20 documents for each topic, along with associated performance and system information.

Named Page Finding Task

Users sometimes search for a page by name. In such cases, an effective search system will return that page at or near rank one. In many cases there is only one correct answer. In other cases, any document from a small set of "near duplicates" is correct.

Systems will be compared on the basis of the rank of the first correct answer. Reported measures will include mean reciprocal rank of first correct answer and success rate at N for N = 1, 5 and 10. Success rate is defined as the percentage of cases in which the correct answer or equivalent URL occurred in the first N documents.

Roughly 150 new topics will be created for this task. Groups may submit up to four runs. A run consists of the top 1000 documents for each topic. For each run, groups should record and report the system characteristics described below. As with the other tasks, the reported query processing time should be reported for the the top 20 documents. No manual or interactive query modification is permitted in this task.

Submissions

For all tasks, the submission form requires each group to report the following details about their hardware configuration and system performance: 1) percentage of document collection indexed, 2) indexing time in minutes, 3) total query processing time (top 20 documents), 4) number of processors, 5) total RAM, 6) size of on-disk file structures, 7) hardware cost, and 8) year of purchase.

For the number of processors, report the total number of CPUs in the system. For example, if your system is a cluster of eight dual-processor machines, you would report 16. For the hardware cost, provide an estimate in US dollars of the cost at the time of purchase.

Some groups may subset the collection before indexing, removing selected pages to reduce its size. The submission form asks for the fraction of pages indexed. If you did not subset the collection before indexing, report 100%.

The submission form will also collect basic query processing information for each run, including the topic fields from which the query was derived, as well as the use of link information and document structure.

Submission Formats

All runs must be compressed (gzip or bzip2).

For all tracks, a submission consists of a single ASCII text file in the format used for most TREC submissions, which we repeat here for convenience. White space is used to separate columns. The width of the columns in the format is not important, but it is important to have exactly six columns per line with at least one space between the columns.

       630 Q0 ZF08-175-870  1 4238 prise1
       630 Q0 ZF08-306-044  2 4223 prise1
       630 Q0 ZF09-477-757  3 4207 prise1
       630 Q0 ZF08-312-422  4 4194 prise1
       630 Q0 ZF08-013-262  5 4189 prise1
          etc.

where:

  • the first column is the topic number.
  • the second column is the query number within that topic. This is currently unused and should always be Q0.
  • the third column is the official document number of the retrieved document and is the number found in the "docno" field of the document.
  • the fourth column is the rank the document is retrieved, and the fifth column shows the score (integer or floating point) that generated the ranking. This score MUST be in descending (non-increasing) order and is important to include so that we can handle tied scores (for a given run) in a uniform fashion (the evaluation routines rank documents from these scores, not from your ranks). If you want the precise ranking you submit to be evaluated, the SCORES must reflect that ranking.
  • the sixth column is called the "run tag" and should be a unique identifier for your group AND for the method used. That is, each run should have a different tag that identifies the group and the method that produced the run. Please change the tag from year to year, since often we compare across years (for graphs and such) and having the same name show up for both years is confusing. Also run tags must contain 12 or fewer letters and numbers, with *NO* punctuation, to facilitate labeling graphs with the tags.


Last updated: Tuesday, 07-June-05
Date created: Wednesday, 01-June-05
claclarke@plg.uwaterloo.ca