Context grep Charles L. A. Clarke and Gordon V. Cormack Department of Computer Science University of Waterloo, Waterloo, Canada Technical report CS-96-41 November 29, 1996 ABSTRACT The Unix grep(1) utility searches a text file for a pattern defined by a regular expression and prints the lines containing the pattern. Unfortunately, a line of text is not always the appropriate unit for search and retrieval. Our solution is to treat a newline as an ordinary character and to allow pattern matching across the entire file. Several issues must be addressed to make this free text searching useful. In particular, the standard approach of taking the "leftmost longest match" tends to select inappropriately large fragments of the text. Instead we follow the principle of always taking the shortest match and allow these matches to overlap but not nest. The resulting tool is novel, expressive and simple. Matches can be reported across lines and multiple matches can be reported within a line. Appropriate structure may be imposed by using a regular expression to define a search universe. Elements may then be selected from this universe by matching with a second regular expression. The current release of the search tool is available at ftp://plg.uwaterloo.ca/pub/mt/cgrep A detailed man page for the program is included as part of the distribution.