CS246 [Winter 2010] Assignment 6.
Version V0.1
5 Mar 2010
Version V0.2 23 Mar 2010
Clarified what .cpp files to turn in.
Now required to use provided .h files. Corrected spelling of tokenstream
to tokenlist. Clarified that
tokenlist.cpp is to be written.
Version V0.3 23 Mar 2010
Documented that token.cpp should be turned in.
Version V0.4 26 Mar 2010 The start-up files now contain a version of binrel.cpp which you can use (notably
if you do not have binrel.cpp from previous assignments.)
Version V0.5 27 Mar 2010
Flexibility
on binrel.h and binrel.cpp. You can use
your own version of binrel.h and binrel.cpp or the ones in the start-up files. If you use your own version, they must be
essentially the same, in terms of their operation, as the ones from any previous
assignments.
Overview. In this
assignment you are to write an interpreter that executes programs written in a
simple language called the BR (Binary Relational) language that supports
operations on binary relations. Each
binary relation is a set of pairs of strings.
Following is an example program written in BR. (This language does essentially the same
thing as the relational calculator in a previous assignment.)
As will be
explained below, you are also to write an “unscanner” (explained below).
Purposes. This
assignment gives practice in using STL maps and vectors; also re-use of a class (binrel); also inheritance (in the token class); also program composition (building “br” out
of “scan” and “go”, described below).
The BR language. Here is
an example program in the BR language.
See the comments for explanations.
(BR does not support comments.)
var love binrel; //
Declare the “love” binary relation variable; set it to empty
insert (love, “Bob”, “Jane”); //
Insert a pair into the “love” relation
insert (love, “Jane”, “Fred”); //
So now Bob loves Jane and Jane loves Fred
var like binrel; //
Declare “like”
insert (like, “Bob”, “George”); //
Insert pair into “like”
insert (like, “Bob”, “Jane”); //
So now Bob likes George and Jane
var likeOrLove binrel; //
Declare “likeOrLove”
likeOrLove = like + love; //
“likeOrLove” becomes the union of “like”
and “love”
output likeOrLove; //
Print the “likeOrLove” relation
var attract binrel; //
Declare the “attract” relation
attract = like – like * love;
// Assignment to “attract”. Note
that “-” is done after “*“
output attract; //
X attracts Y == X likes Y unless X likes someone who loves Y
A variable
must be declared before being used (DBU).
A newly declared variable is implicitly set to the empty relation
(containing no pairs). A variable (with a
particular name) should not be redeclared.
Variables
are assigned values using the assignment operator “=”.
The insert
statement takes as arguments (in parentheses) a binary relation variable and two
quoted strings (surrounded by double quotes), containing no escaped characters,
and inserts the two as a pair into the variable.
The
right-hand side of an assignment is an expression that consists of variables
such as “like” and “love” combined using the operators +, - and * (union,
subtraction and composition i.e. join) and grouped using parenthesis, “(“ and
“)”. Like languages such as C++, the “*”
operator has higher precedence that “+” and ”-“. These two, “+” and “-“ are evaluated from
left to right.
The
”output” statement has one argument (not in parentheses) which is a binary
relation variable.
Scanner.
To make
things easier, you will be given a scanner named “scan” (which you are to use)
which reads BR programs and breaks them up into tokens. When the scanner reads the above program, it
outputs the following token stream (not including the comments).
x:var // Identifier token var
x:love // Identifier token love
x:binrel
; // Semicolon token
x:insert
( // Left parenthesis token
s:love
,
“:Bob // String token Bob
... ... .. //
etc, more tokens here ending with the following
x:output
x:attract
;
E // End of file token, represented as capital E
As illustrated above, each line output by the scanner represents one token. Each token can be simple in which case it represents its own value, such as semicolor “;” or left parenthesis “(“. The token can also be compound, as in x:var (x means it is an identifier and “var” is its value) or “:Bob which represents a string whose value is “Bob”. In a compound token, a colon separates the token type (such as “x”) from its value (such as “var”). The final token E is not directly derived from the program, but rather is added to the end of token stream by the scanner for convenience. We assume that strings do not contain escapes, i.e., no use of backslashes. As output by the scanner, each token (include the end of file token E) is on a separate line, that is, each ends with a newline character. The language has no comments, so no // comments.
The scanner can be used for languages other than BR. As a result, some tokens, such as >>, may be recognized by the scanner, but not accepted by your interpreter. This does not cause problems. When >> is (erroneously) part of the BR program, it is scanned without error message, but it will cause a syntax error in the interpreter.
The “go” program. You are to write a C++ program called “go” which reads a token stream
(as illustrated above) for an BR program from the standard input and executes
it, producing output on the standard output stream for each BR “output”
statement.
BR grammar. The grammar for the BR language is as follows:
program ::=
statements
statements ::= { declaration | insert | assignment |
output }
declaration::= x:var x x:binrel ”;”
insert ::=
x:insert ”(” x ”,” stringconst ”,” stringconst ”)” ”;”
assignment ::= x ”=” expression ”;”
output ::=
x:output x “;”
expression ::= term { (”+” | ”-” ) term }
term ::=
factor { ”*” factor }
factor ::= x
| ”(” expression ”)”
In the
grammar, the notation “x:ccc” for
example, x:var means the identifier var.
Similarly for x:insert
etc. By itself x means any identifier.
In the
grammar, the notation “stringconst” means a (quoted) string constant, e.g.,
“Hello”.
The
notation in the grammar { … } means zero
or more replications while | means a choice.
Use recursive descent. Your interpreter should be structured as a recursive descent
parser. For discussion of this see:
http://en.wikipedia.org/wiki/Recursive_descent_parser
You should
use one function for each non-terminal: program,
statements, declaration, insert, assignment, output, expression, term and factor.
Note that expression, term and factor are recursive.
Besides parsing the program, you will also interpret it on the fly. This means using the “go” program to allocate
(during declarations), assign (during assignments) and retrieve (during expressions
and outputs) values of variables (always binary relations). The output goes to the terminal (to standard
output). The expression, term and factor
functions should return a binrel.
Unscanner. If your “go” program (or your “br” script) receives the flag –u then it
should forego interpreting and instead should “unscan” the token stream, as
will not be explained. For example,
suppose the token stream begins as:
x:var
x:love
x:binrel
;
x:insert
(
s:love
,
“:Bob
etc
When this is “unscanned”, the result is:
var love binrel ;
insert ( love , “Bob” , etc
In other words, each token from the token stream is turned back into its
original representation as written in a source BR program, although the white
space may vary. Each token, except
semicolon “;’ is to be followed by a blank, while semicolon is to be followed
instead by a newline character. When the
unscanner reads the eof token “E”, unscanning stops with no more output
(without outputting the “E”).
Your unscanner can be useful for checking to see if the scanner and the
reading parts of the interpreter are working correctly.
Your unscanner should be “idempotent” meaning that if it repeatedly scans and unscans a program,
each program as unscanned (except the original program) will be identical
character for character including white space.
Implementation.
Your “go” program should read the token stream into a vector of pointers
to token
objects. It should call the module “tokenlist”
to do this reading. After all the tokens
have been read in by “tokenlist”, “go” should either execute the program (by
calling a function called exec) or, if there is a “-u” flag, it should call a
program named “unscan” to unscan the program.
The execution or scanning should fetch
tokens, one at a time, by calling the appropriate function in “tokenlist”.
You are to write go.cpp, exec.cpp, tokenlist.cpp and
unscan.cpp.
You are to use your binrel class from
a previous assignment. (UPDATE: You can instead use the version of
binrel.cpp in the start-up files).
Use a C++ STL map to represent the variables in the BR program as a
pair, where the name of the variable is the pair’s key and pair’s target is a
binrel object.
You will be provided with a set of files; see http://plg.uwaterloo.ca/~holt/cs/246/2010/asgn06/br
You can copy a file such as “xxx” to your directory as follows
curl
http://plg.uwaterloo.ca/~holt/cs/246/2010/asgn06/br/xxx > xxx
See additional instructions in the file README in that directory.
The provided files will include “tokennames.h”. Use token names such as plusToken defined in
that file rather than literal constants such as ‘+’.
The “br” shell script. You are to create a shell script named br (make it executable with chmod +x br) which pipes the output of scan (provided to you) into go (which you are to write). This command should optionally accept the flag –u (for “unscan”). If this flag is present, your program should “unscan” the token stream (explained below), instead of interpreting the stream. The “br” script can be done this way.
./scan | ./go
$1 # With optional –u flag
Dot h files.
That directory will contain a number of “.h” files, which you should
use. (But
you can use your own binrel.cpp file).
IO Files. Your program should use only
standard input and output (and error output); you should not use named files.
Timing. Test the speed of your
implementation of BR (time to scan and execute), using the “time” shell
script. That is, you are to determine
the time to execute your “br” script on the files rand*.br which are provided to you.
Produce a table which gives for each rand*.br the “user” time for its execution.
Hints.
See Buhr slides page 121 (S2.8.1) (and following pages) for description
of IO.
You may want to use “noskipws” as a basis for reading the input
character by character. See Buhr slide
124. See example: http://www.student.cs.uwaterloo.ca/~cs246/current/code_examples/commandline_arguments.cc Note that ctl-D typed at the terminal
produces EOF. Consider using the “fail”
function to detect EOF.
See Buhr slides page 142 (S2.10) for description of reading command line
arguments. You are to test for the “–u”
argument, to see if you should do “unscanning” instead of interpretation.
You may need to use a dynamic_cast to convert a pointer to a token to
a pointer to a complex token; see Buhr slide 227 (S2.17.8).
You are to use an STL “map” to
store binary relations and to look them up based on their declared names. See Buhr notes page 238 (S2.18.1.2). You can use the “count” member of a map to
determine if a particular variable name exists in the map.
To produce output from the BR “output” statement, call the output
function of the token class. The
function should output the “source” characters for the token (but without any
white space around the source token).
Any error messages you produce should start with the word ERROR, should
fit on a single line, and should be output to the error stream.
Generally, you do not need to try to recover from an error; you can just
abort.
You should turn in: