Baseline Model Implementation for Automatic Participation in the TREC 2015 Total Recall Track
The Baseline Model Implemention ("BMI") is free software (licence: GPL v3) that participants
may use to automate the TREC 2015 Total Recall Track. BMI uses the AutoTAR
Continuous Active Learning (CAL) method to fully automate
the task required of Total Recall Track participants. The goal for
participants is to achive better effectiveness than BMI, either by modifying
it, or by implementing their own solutions from scratch.
Overview of BMI
To use BMI, participants must:
- Download and install free VirtualBox Software from Oracle.
- Download and install free BMI virtual machine and run scripts.
- [optionally] configure the BMI run scripts.
- [optionally] modify the BMI virtual machine.
- Run the BMI virtual machine using VirtualBox.
- Track progress using logs and/or the Web.
- Repeat until desired results are achieved.
Installing VirtualBox
Installers for Linux, Mac OS X, and Windows are available from Oracle.
Debian and Debian-based Linux distributions have an available package "virtualbox" that may be installed
using a package manager or the command line "sudo apt-get install virtualbox".
Be aware that VirtualBox will run the 64-bit BMI virtual machine only if the BIOS
"Virtualization Technology" and "VT-d" features are enabled. It appears that many
desktop-class machines disable these features by default. If you don't enable them,
BMI will install but fail to boot.
BMI requires 2GB of RAM to run and may consume up to 100GB of disk space for full runs.
The host computer on which you install VirtualBox should have substantially more than this.
Installing BMI
On Linux, ensure that either wget or curl is installed, open a terminal, and type
one of these commands:
- curl https://plg.uwaterloo.ca/~gvcormac/trecvm/vminstall1.sh | bash
- wget -O - https://plg.uwaterloo.ca/~gvcormac/trecvm/vminstall1.sh | bash
On Mac OS X, open a terminal and type the following command:
- curl https://plg.uwaterloo.ca/~gvcormac/trecvm/vminstall.sh | bash
On Windows download and run the installer at
On all systems, the installer will create a virtual machine named TRECTR,
and will install a folder "vmscripts" in the VirtualBox home folder.
The name of the VirtualBox home folder is reported by the installer,
and can also be determined using VirtualBox.
Configuring BMI
Configuring BMI Parameters
In the folder "vmscripts" you fill find file "TREC_Config.txt" which you
may edit to supply your group name, the name of the test you wish to
run, and the name of the TREC Total Recall server. The default contents
of "vmscripts" is:
TRUSER=test
TRSERVER=quaid.uwaterloo.ca:33333
TRTEST=trivial
TRRUN=testrun
To perform simple testing, you do not need to change any of these settings.
When you receive a participant ID from TREC, you should replace
the value of TRUSER with that ID. (The "test" ID is restricted to
smaller tests.)
Barring unforseen circumstances, it should never be necessary to
chage TRSERVER. TRSERVER is the name of the TREC server from which
BMI fetches the datasets, topics, and relevance assessments. The interface
to the TREC server is documented here.
TRTEST may (at the time of writing) be one of: "trivial", "test",
and "bigtest".
- "trivial" uses 7 topics with 2 datasets containing
30 documents each, and is useful to see that things are working.
On typical hardware, "trivial" takes several minutes to run.
- "test" uses 7 topics with 2 datasets containing about 20,000
documents each, and is therefore more useful for evaluating retrieval
effectiveness. On typical hardware, "test" takes about 30 minutes.
- "bigtest" uses 2 topics with one dataset containing about 750,000 documents.
On typical hardware, "bigtest" takes about 5 hours.
Configuring BMI Scripts
The folder "vmscripts/cmd.dir" is shared with the BMI virtual
machine. Immediately after the BMI VM boots, it executes
"vmscripts/cmd.dir/start" which is a bash script. This script
can execute any command installed on the BMI VM (which is
Debian Linux with developer tools installed). The script can
also execute other scripts or (Debian) executable files
contained in vmscripts.
The BMI VM is configured with a 2GB root drive and a 1TB
scratch drive mounted as /tmp. All drives are reinitialized
every time the BMI VM is rebooted.
Log files and other persistent information may be written
to files in the vmscripts folder.
The scripts run as the user named "user".
Configuring the BMI VM
The password for "user" is "user" and the password for "root"
is "root". You can sign on to the VM while it is running. However,
since the disks are reinitialized at every boot, you will not
be able to permanently install software. You can compile and/or
install software into the shared folder.
Participants who need to modify the operating system (or to use an
entirely different operating system, such as Windows) will
need to create a new root disk (possibly by using
VirtualBox's "clonehd" to copy the BMI root disk). However,
the shared folder and scratch configurations should be preserved,
and the VM must read and honor the contents of the
TREC_Config.txt file in the shared folder.