CSC246 [Winter 2009] Asgn01

CS246 [Winter 2010] Assignment 6.

Version V0.1 5 Mar 2010

Version V0.2 23 Mar 2010 Clarified what .cpp files to turn in. Now required to use provided .h files. Corrected spelling of tokenstream to tokenlist. Clarified that tokenlist.cpp is to be written.

Version V0.3 23 Mar 2010 Documented that token.cpp should be turned in.

Version V0.4 26 Mar 2010 The start-up files now contain a version of binrel.cpp which you can use (notably if you do not have binrel.cpp from previous assignments.)

Version V0.5 27 Mar 2010 Flexibility on binrel.h and binrel.cpp. You can use your own version of binrel.h and binrel.cpp or the ones in the start-up files. If you use your own version, they must be essentially the same, in terms of their operation, as the ones from any previous assignments.

Overview. In this assignment you are to write an interpreter that executes programs written in a simple language called the BR (Binary Relational) language that supports operations on binary relations. Each binary relation is a set of pairs of strings. Following is an example program written in BR. (This language does essentially the same thing as the relational calculator in a previous assignment.)

As will be explained below, you are also to write an “unscanner” (explained below).

Purposes. This assignment gives practice in using STL maps and vectors; also re-use of a class (binrel); also inheritance (in the token class); also program composition (building “br” out of “scan” and “go”, described below).

The BR language. Here is an example program in the BR language. See the comments for explanations. (BR does not support comments.)

var love binrel; // Declare the “love” binary relation variable; set it to empty

insert (love, “Bob”, “Jane”); // Insert a pair into the “love” relation

insert (love, “Jane”, “Fred”); // So now Bob loves Jane and Jane loves Fred

var like binrel; // Declare “like”

insert (like, “Bob”, “George”); // Insert pair into “like”

insert (like, “Bob”, “Jane”); // So now Bob likes George and Jane

var likeOrLove binrel; // Declare “likeOrLove”

likeOrLove = like + love; // “likeOrLove” becomes the union of “like” and “love”

output likeOrLove; // Print the “likeOrLove” relation

var attract binrel; // Declare the “attract” relation

attract = like – like * love; // Assignment to “attract”. Note that “-” is done after “*“

output attract; // X attracts Y == X likes Y unless X likes someone who loves Y

A variable must be declared before being used (DBU). A newly declared variable is implicitly set to the empty relation (containing no pairs). A variable (with a particular name) should not be redeclared.

Variables are assigned values using the assignment operator “=”.

The insert statement takes as arguments (in parentheses) a binary relation variable and two quoted strings (surrounded by double quotes), containing no escaped characters, and inserts the two as a pair into the variable.

The right-hand side of an assignment is an expression that consists of variables such as “like” and “love” combined using the operators +, - and * (union, subtraction and composition i.e. join) and grouped using parenthesis, “(“ and “)”. Like languages such as C++, the “*” operator has higher precedence that “+” and ”-“. These two, “+” and “-“ are evaluated from left to right.

The ”output” statement has one argument (not in parentheses) which is a binary relation variable.

Scanner. To make things easier, you will be given a scanner named “scan” (which you are to use) which reads BR programs and breaks them up into tokens. When the scanner reads the above program, it outputs the following token stream (not including the comments).

x:var // Identifier token var

x:love // Identifier token love

x:binrel

; // Semicolon token

x:insert

( // Left parenthesis token

s:love

“:Bob // String token Bob

... ... .. // etc, more tokens here ending with the following

x:output

x:attract

;

E // End of file token, represented as capital E

As illustrated above, each line output by the scanner represents one token. Each token can be simple in which case it represents its own value, such as semicolor “;” or left parenthesis “(“. The token can also be compound, as in x:var (x means it is an identifier and “var” is its value) or “:Bob which represents a string whose value is “Bob”. In a compound token, a colon separates the token type (such as “x”) from its value (such as “var”). The final token E is not directly derived from the program, but rather is added to the end of token stream by the scanner for convenience. We assume that strings do not contain escapes, i.e., no use of backslashes. As output by the scanner, each token (include the end of file token E) is on a separate line, that is, each ends with a newline character. The language has no comments, so no // comments.

The scanner can be used for languages other than BR. As a result, some tokens, such as >>, may be recognized by the scanner, but not accepted by your interpreter. This does not cause problems. When >> is (erroneously) part of the BR program, it is scanned without error message, but it will cause a syntax error in the interpreter.

The “go” program. You are to write a C++ program called “go” which reads a token stream (as illustrated above) for an BR program from the standard input and executes it, producing output on the standard output stream for each BR “output” statement.

BR grammar. The grammar for the BR language is as follows:

program ::= statements

statements ::= { declaration | insert | assignment | output }

declaration::= x:var x x:binrel ”;”

insert ::= x:insert ”(” x ”,” stringconst ”,” stringconst ”)” ”;”

assignment ::= x ”=” expression ”;”

output ::= x:output x “;”

expression ::= term { (”+” | ”-” ) term }

term ::= factor { ”*” factor }

factor ::= x | ”(” expression ”)”

In the grammar, the notation “x:ccc” for example, x:var means the identifier var. Similarly for x:insert etc. By itself x means any identifier.

In the grammar, the notation “stringconst” means a (quoted) string constant, e.g., “Hello”.

The notation in the grammar { … } means zero or more replications while | means a choice.

Use recursive descent. Your interpreter should be structured as a recursive descent parser. For discussion of this see:

http://en.wikipedia.org/wiki/Recursive_descent_parser

You should use one function for each non-terminal: program, statements, declaration, insert, assignment, output, expression, term and factor. Note that expression, term and factor are recursive. Besides parsing the program, you will also interpret it on the fly. This means using the “go” program to allocate (during declarations), assign (during assignments) and retrieve (during expressions and outputs) values of variables (always binary relations). The output goes to the terminal (to standard output). The expression, term and factor functions should return a binrel.

Unscanner. If your “go” program (or your “br” script) receives the flag –u then it should forego interpreting and instead should “unscan” the token stream, as will not be explained. For example, suppose the token stream begins as:

x:var

x:love

x:binrel

;

x:insert

(

s:love

“:Bob

etc

When this is “unscanned”, the result is:

var love binrel ;

insert ( love , “Bob” , etc

In other words, each token from the token stream is turned back into its original representation as written in a source BR program, although the white space may vary. Each token, except semicolon “;’ is to be followed by a blank, while semicolon is to be followed instead by a newline character. When the unscanner reads the eof token “E”, unscanning stops with no more output (without outputting the “E”).

Your unscanner can be useful for checking to see if the scanner and the reading parts of the interpreter are working correctly.

Your unscanner should be “idempotent” meaning that if it repeatedly scans and unscans a program, each program as unscanned (except the original program) will be identical character for character including white space.

Implementation. Your “go” program should read the token stream into a vector of pointers to token objects. It should call the module “tokenlist” to do this reading. After all the tokens have been read in by “tokenlist”, “go” should either execute the program (by calling a function called exec) or, if there is a “-u” flag, it should call a program named “unscan” to unscan the program. The execution or scanning should fetch tokens, one at a time, by calling the appropriate function in “tokenlist”.

You are to write go.cpp, exec.cpp, tokenlist.cpp and unscan.cpp.

You are to use your binrel class from a previous assignment. (UPDATE: You can instead use the version of binrel.cpp in the start-up files).

Use a C++ STL map to represent the variables in the BR program as a pair, where the name of the variable is the pair’s key and pair’s target is a binrel object.

You will be provided with a set of files; see http://plg.uwaterloo.ca/~holt/cs/246/2010/asgn06/br

You can copy a file such as “xxx” to your directory as follows

curl http://plg.uwaterloo.ca/~holt/cs/246/2010/asgn06/br/xxx > xxx

See additional instructions in the file README in that directory.

The provided files will include “tokennames.h”. Use token names such as plusToken defined in that file rather than literal constants such as ‘+’.

The “br” shell script. You are to create a shell script named br (make it executable with chmod +x br) which pipes the output of scan (provided to you) into go (which you are to write). This command should optionally accept the flag –u (for “unscan”). If this flag is present, your program should “unscan” the token stream (explained below), instead of interpreting the stream. The “br” script can be done this way.

./scan | ./go $1 # With optional –u flag

Dot h files. That directory will contain a number of “.h” files, which you should use. (But you can use your own binrel.cpp file).

IO Files. Your program should use only standard input and output (and error output); you should not use named files.

Timing. Test the speed of your implementation of BR (time to scan and execute), using the “time” shell script. That is, you are to determine the time to execute your “br” script on the files rand*.br which are provided to you. Produce a table which gives for each rand*.br the “user” time for its execution.

Hints.

See Buhr slides page 121 (S2.8.1) (and following pages) for description of IO.

You may want to use “noskipws” as a basis for reading the input character by character. See Buhr slide 124. See example: http://www.student.cs.uwaterloo.ca/~cs246/current/code_examples/commandline_arguments.cc Note that ctl-D typed at the terminal produces EOF. Consider using the “fail” function to detect EOF.

See Buhr slides page 142 (S2.10) for description of reading command line arguments. You are to test for the “–u” argument, to see if you should do “unscanning” instead of interpretation.

You may need to use a dynamic_cast to convert a pointer to a token to a pointer to a complex token; see Buhr slide 227 (S2.17.8).

You are to use an STL “map” to store binary relations and to look them up based on their declared names. See Buhr notes page 238 (S2.18.1.2). You can use the “count” member of a map to determine if a particular variable name exists in the map.

To produce output from the BR “output” statement, call the output function of the token class. The function should output the “source” characters for the token (but without any white space around the source token).

Any error messages you produce should start with the word ERROR, should fit on a single line, and should be output to the error stream.

Generally, you do not need to try to recover from an error; you can just abort.

You should turn in:

Turn in your .cpp files used to implement “go”. These are: go.cpp, exec.cpp, tokenlist.cpp and unscan.cpp. Also turn in binrel.cpp (this can be the version from the start-up files) and token.cpp from previous assignments, updated if needed. Also turn in binrel.h (this can be the version from the start-up files). Your program should match the provided makefile. Turn in a README file, described in the following.
Turn in a list in your README file of any features of BR that you did not correctly implement.
Turn in a list in your README file of all those provided testing files (test*.br, demo*.br, eg*.br) which your implementation did not correctly execute.
Turn in a table in your README file in which you give your times to execute “br” on rand*.br