Introduction to the Grok Language

Ric Holt,  5 May 2002

 

 

Abstract.  This note introduces Grok, a notation for manipulating binary relations.  Grok has an interpreter, which can be considered to be a relational calculator.  This interpreter has been used extensively for analyzing factbases produced by parsers that extract information from source programs. 

 

Background and Overview.  The initial version of Grok was created by the author in 1995.  It has evolved to become a language for manipulating factbases.  Grok operates at the level of a relational database, in that operators generally apply across entire relations, and not just to single entities.  The Grok interpreter has been optimized to handle large factbases (up to several hundred thousands of facts).  It keeps all of its data structures in memory.  It is written in the Turing language. 

The term grok means to understand, which is appropriate in that the Grok language is used to help understand large-scale software.  This term was coined in the book Stranger in a Strange Land, by Robert A. Heinlein.

This report consists of five parts.  Part 1 introduces Grok’s set and relation operators using an example involving Garfield the cat.  These operators are summarized in Part 2.

Part 3 shows how Grok can be used to interactively query factbases.  It uses an example to introduce schemas (also called schemes) that characterize the form of factbases.  The reader may want to consult the related report on TA (the Tuple Attribute languages) when reading this part.  Part 4 shows how scripts written in Grok can be used to query and transform factbases.  Finally, Part 5 summarizes the features of Grok.

 

 

 

Table of Contents

 

Part 1. Introduction to Grok’s Operators

Part 2. Summary of Grok’s Operators

Part 3. Grok Queries, Factbases and Schemas

Part 4. Writing Grok Scripts

Part 5. Summary of Grok Features

 


Part 1.

Introduction to Grok’s Operators

 

We will introduce Grok’s set and relation operators using an example in which there are two cats (Garfield and Fluffy), two mice (Mickey and Nancy) and two pieces of cheese (Roquefort and Swiss).  We will begin by defining sets for these cats, mice and cheese.

Set constructors.  In the Grok language we write

cat := {“Garfield”, “Fluffy”}

to construct the set variable named cat containing the strings Garfield and Fluffy.  Similarly, we can create set variables mouse and cheese:

mouse := {“Mickey”, “Nancy”}

cheese := {“Roquefort”, “Swiss”}

Set union, intersection and subtraction.  We can combine sets.  For example, the Grok statement

animal := cat + mouse

creates the set called animal which is the union of the cat and mouse sets.  As a result, animal contains the four strings Garfield, Fluffy, Mickey and Nancy.  Similarly, we can take the union of the mouse and cheese sets to create the set of items that are food:

food := mouse + cheese

As a result, the food set will contain Mickey, Nancy, Swiss and Roquefort.

Intersecting animals with food, we can find the set of animals that are food:

animalsWhichAreFood := animals ^ food

These animals are Mickey and Nancy.  We can use set subtraction to find those animals, which are not food:

            animalsWhichAreNotFood := animalsfood

These animals are Garfield and Fluffy.

Set cardinality.  We can ask how many items are food:

            howManyThingsAreFood :=  # food

This creates a new scalar (string) variable called howManyThingsAreFood, which is set to 4.  This statement uses the cardinality operator # to return the size of the set

            {“Mickey”, “Nancy”, “Swiss”, “Roquefort”}. 

Command line statements.  When Grok is used as a command line interpreter, you can directly enter statements, such as the example Grok statements we have given so far.  You can type an expression or value, and the Grok interpreter will evaluate it and print it.  For example, if you have already entered the above statements and you type

            mouse

then Grok will print back to you

            Mickey

            Nancy

Sets are not ordered.  Since ordering is immaterial in sets, we do not know if Grok will output Mickey before or after Nancy.

Script language.  Grok is also a script language, so you can write a Grok program, consisting of Grok statements, to be executed by the Grok interpreter.

Set comparisons.  We can compare sets, for example:

            mouse <= food

This determines whether mouse is a subset of food.   Both elements of mouse (Mickey and Nancy), are elements of food, so the Grok interpreter will output true.

The Grok set comparison operators are given in Figure 1.1.

 

==

equal, which can also be written as a single equal sign

~=

not equal

<=

subset

>=

superset

<

strict subset

>

strict superset

Figure 1.1.  Grok Comparison Operators

 

Summary of set operators.  This section has illustrated how sets are constructed using the notation {item1, item2, …}, how these sets can be combined using the union (+), intersection (^), and subtraction (-) operators, and how they can be compared, using <=, =, etc. As well, we introduced the cardinality operator (#). 

 

Relations in Grok

 

We will now introduce Grok’s relations and relational operators.  In Grok, a relation is a set of ordered pairs.  For example, a relation might consist of the two pairs (Garfield, Fluffy) and (Mickey, Nancy).  This relation might be the love relation, signifying that Garfield loves Fluffy and  Mickey loves Nancy. 

Cross product.  We can use the cross product operator X (capital letter X) to create relations, for example:

            chase := cat X mouse

This can be considered to mean that cats chase mice.  As defined by this statement, the chase relation consists of these four pairs

            Garfield          Mickey

            Garfield          Nancy

            Fluffy              Mickey

            Fluffy              Nancy

In other words, both cats chase both mice.  In general, the cross product of sets S and T produces every pair (x, y) such that x is in S and y is in T.

We can create the eat relation this way

            eat := chase  +  mouse X cheese

This creates the eat relation to be the same as  chase, but with the addition (union) of the cross product of mice and cheese.  As a result, the eat relation will contain these eight pairs:

            Garfield          Mickey

            Garfield          Nancy

            Fluffy              Mickey

            Fluffy              Nancy

            Mickey            Swiss

            Mickey            Roquefort

            Nancy              Swiss

            Nancy              Roquefort

We can interpret this to mean that cats eat mice and mice eat cheese.

Operator precedence.  Grok uses the traditional precedence of operators from mathematics, namely, additions (as well unions and subtractions) are done after multiplications (as well as intersections and cross products).  For example, in the statement eat := chase  +  mouse X cheese, the cross product X is done before union (+).

Relation cardinality.  We can use the cardinality operator # to ask how many pairs are in a relation:

            howManyPairs := # chase

This will create the string variable howManyPairs and will assign it the value 4. 

RSF.  We sometimes represent the facts of a collected set of relations as triples of the form (R, x, y), where relation R contains the pair (x, y).  For example, the  chase and eat relations, taken together, can be represented this way:

chase               Garfield                Mickey

chase               Garfield                Nancy

chase               Fluffy                    Mickey

chase               Fluffy                    Nancy

eat                   Garfield                Mickey

eat                   Garfield                Nancy

eat                   Fluffy                    Mickey

eat                   Fluffy                    Nancy

eat                   Mickey                  Swiss

eat                   Mickey                  Roquefort

eat                   Nancy                    Swiss

eat                   Nancy                    Roquefort

This list notation is called RSF (Rigi Standard Format), because it was developed for use in the Rigi reverse engineering tool.  We sometimes call a triple a fact, and we sometimes call these lists factbases.  Ordering is immaterial in relations, so we can change the ordering of the lines in this list without changing the meaning of the list. 

Set projectors.  We can use the set projector operator, written as a dot,  to determine how a set is related to another set.  For example, we can determine what Mickey eats, by asking the question: How is the set {“Mickey”} related, via eat,  to another set:

            mickeyEatsWhat :=  {“Mickey”} . eat

This statement assigns mickeyEatsWhat the set containing Swiss and Roquefort. Similarly, we can ask what eats Mickey:

            whatEatsMickey := eat . {“Mickey”}

This statement assigns whatEatsMickey the set containing Garfield and Fluffy.  In the general case, when s is a set and R is a relation, the form  s . R produces the set  t such that if s contains x and R contains (x, y) then t contains y.  The form R . s produces the set t such that if R contains (x, y) and s contains y then t contains x.

Domains and ranges.  The dom (domain) of a relation is the set of all beginnings (left ends) of pairs in a relation.  For example

            eater := dom eat

This statement assigns eater the set of things that eat.  This set consists of Garfield, Fluffy, Mickey and Nancy.  The statement

            food :=  rng eat

assigns food  the set of things that are eaten.  This set consists of Mickey, Nancy, Swiss and Roquefort.

The sink of a relation R is the set of items that are in R’s range but not in its domain.  We can compute the sink for the eat relation as follows:

            bottomOfFoodChain := rng eat – dom eat

This assigns  bottomOfFoodChain the set containing Swiss and Roquefort.  The source of a relation R is the set of items that are in the relation’s domain, but not in its range, as in:

            topOfFoodChain :=  dom eat – rng eat

This assigns topOfFoodChain the set containing Garfield and Fluffy.

Relation union, intersection and subtraction.  There are a number of Grok operators that combine relations.  For example, we can intersect the eat and chase relations to tell us about both eating and chasing:

            bothEatAndChase :=  eat ^ chase

This statement assigns bothEatAndChase  this set of pairs:

            Garfield          Mickey

            Garfield          Nancy

            Fluffy              Mickey

            Fluffy              Nancy

This means that Garfield both chases and eats Mickey, that Garfield both eats and chases Nancy, etc.  Relation intersection (^), returns the sets of pairs that are common to two relations.  Analogously, relation union (+), returns all pairs that exist in either of the two relations.

Subtraction (-) is another relation operator, for example,

            eatButNotChase := eatchase

This assigns eatButNotChase those pairs in eat that are not in chase.  As a result, eatButNotChase records that fact that mice eat but do not chase cheese.

Similarly, we can determine the relation for chasing but not eating:

            chaseButNotEat := chaseeat

It turns out that this is the empty relation, because in our example, all chased things (the mice) are eaten (by the cats). 

Empty relation.  We could as well have written

            chaseButNotEat  := EMPTYREL

EMPTYREL is Grok notation for the relation containing no pairs.  Another way to accomplish the same thing is to write the cross product of two empty sets:

            ChaseButNotEat  :=  { }  X  { }

Relation comparisons.  It turns out that the relation bothEatAndChase and chase are identical, which we could check using comparison of relations, by writing:

            bothEatAndChase == chase

This will return the value true.  We can compare relations using the comparison operators that apply to sets, namely, ==, ~=, >=, <=. > and <.  Each relation is actually a set of pairs, and these operators have the same meanings as they do for proper sets.

Relation composition and transitive closure.  We will now introduce the composition operator (o).  Relation R1 composed with relation R2 produces a new relation by a join in which each result pair is formed by attaching the right end of a pair in R1 to the left end of a pair in R2. Consider this example,

            secondOrderEat :=  eat  o  eat

As a result, secondOrderEat will contain pairs such as (Garfield, Swiss) because Garfield eats something which eats Swiss.  In general, if R1 contains (x, y) and R2 contains (y, z), then the result of R1 o R2 will contain (x, z)

The transitive closure of a relation R composes R with itself repeatedly, for example:

            anyOrderEat := eat +

We can think of the result as “eats at any level of indirection”.  The relation anyOrderEat will contain the pair that are in eat, such as (Garfield, Mickey) and (Mickey, Swiss), as well as second order edges such as (Garfield, Swiss).  In this example, there are no higher level indirections of eating, due to the simple nature of our eat relation.

We can also compute the reflexive transitive closure of a relation R, written as R*, which is the same as transitive closure, with the addition of self loops, (x, x), on all items in the factbase. 

Set id (identity) operator.  A set of self loops, on a particular set of nodes, can be computed this way:

            isMouse := id (mouse)

This assigns isMouse  this set of edges:

            Mickey            Mickey

            Nancy              Nancy

In general, if set s contains x, then id s contains (x, x).

The id  (identity) operator, and its self loops, are handy for restricting a relation to start or to end with a given set of items.  For example, we can select the pairs in the eat relation that are eaten by mice as follows:

            eatByMouse:= id (mouse)  o  eat

The resulting eatByMouse relation contains only the pairs corresponding to eating by mice.  This composition (o) is computed by starting with the self loops in id (mouse).  This guarantees that each result pair starts with a mouse.  These self loops are “joined” with pairs from eat.  The resulting pairs corresponding to eating by mice.

The identity of all nodes in the current Grok factbase is written as ID.

Relation inverse.  Another relation operation is the inverse (or reverse) operator, for example:

            chasedBy := inv chase

For each pair (x, y) in chase, this produces the pair (y, x) in chasedBy.  So if Garfield chases Mickey then Mickey is chasedBy Garfield.

 

 


Part 2.

Summary of Grok’s Operators

 

This section summarizes the Grok operators that were introduced using the example involving Garfield et al.

Union, intersection, subtraction, comparison and cardinality. Using the example of the cats, mice and cheese, we have introduced the basic set and relation operators of Grok.  These include the following five operations given in Figure 2.1.

 

Operator

Form

Union

x + y

Intersection

x ^ y

Subtraction

x - y

Comparison

x = y, x <= y, …

Cardinality

# x

Figure 2.1.  Operators used with sets and with relations

 

The two operands (x and y in above examples) of union, intersection, subtraction and comparison must both be sets or else both be relations.

Set and relation constructors.  Sets can be created using a set constructor, for example:

            { x, y, z }

The empty set can be written as

            { }

or as

            EMPTYSET

Relations can be created using cross products, for example:

            { a, b }  X  { x, y, z }

The result of the cross produce of sets s and t,  s X t, contains all pairs (x, y) such that x is in s and y is in t.

A collection of relations can be represented in RSF format, i.e., as a list of triples.  RSF format is used for files, but cannot be used in a Grok command.

The empty relation can be written as

            EMPTYREL

or as

                  {} X {}

Domain and range.  The domain (set of initial nodes) and range (set of final nodes) of relation R are written as:

            dom R

            rng R

Set projectors.  A set s can be combined with relation R in a set projector using the form

            s . R                

or

            R . s

The form  s . R produces the set  t such that if s contains x and R contains (x, y) then t contains y.  The form R . s produces the set t such that if R contains (x, y) and s contains y then t contains x.

Identity relations.  The identity relation on set s is written as:

            id s

If s contains node x then id s contains the pair (x, x).  The identity relation over all nodes in relations in Grok’s current factbase is written as:

            ID

Relation inverse.  The inverse (or reverse) of relation R is written as
                  inv R

If R contains pair (x, y) then its inverse contains (y, x).

Relation composition.  The composition of relations R1 and R2 is written as:

            R1 o R2

If R1 contains the pair (x, y) and R2 contains the pair (y, z), then their composition is a relation that contains the pair (x, z).  (Composition is much like a “join” in a conventional relational database.)

Transitive closure. The (non-reflexive) transitive closure of relation R is written as:

            R +

The transitive closure consists the union of R composed with itself  any number of times:

            R +   =   R   +   (R o R)  +  (R o R o R)  + 

The symmetric transitive closure is written as

            R *

The value of R* is the same as R+ with the addition of self loops (ID):

            R *   =   ID   +   R+

 


 Part 3.

Grok Queries, Factbases and Schemas

 

 

This section shows how the Grok interpreter can be used as an interactive query engine.  Before doing this, we will introduce an example C program and will show how facts about that program can be represented as a factbase.  We will then give example Grok queries that probe these facts.

 

 

The EG Example: A Very Simple C Program

 

We will use a very simple C program, called EG, as an example. Our C program consists of three files: main.c, putint.c and putint.h. The first, main.c, sets a global variable to 5 and calls the putint function with a parameter of 10. The putint.h file declares a global variable to be an integer (as external) and gives the header of the putint function (as external). The putint.c program gives the body of the putint function, which computes the product of its parameter and the global variable and prints the result. These files are listed in Figure 5.1.

 

File main.c:

      #include "putint.h"
      
      void main(void){
          global_var=5;
          putint(10);
      }

File putint.h:

      #ifndef _PUTINT_H_
      #define _PUTINT_H_
      
      extern int global_var;
      extern int putint(int param);
      
      #endif

File putint.c:

      #include "putint.h"
      
      int global_var;
      
      int putint(int param) {
          int local_var;
          local_var=global_var*param;
      
          printf("The number is:%d\n",local_var);
      }

 

Figure 5.1. EG: A Very Simple C Program

 

Facts Extracted from EG

 

The PBS project, at the Universities of Toronto and Waterloo, developed a special purpose version of the GCC compiler front end called CFX (C Fact Extractor), which translates C programs into graphs.  In the case of the EG program the translation produces the graph as the stream of triples shown in Figure 5.2. (To be technically correct, we should say that CFX produces tables, which are translated into the stream by a program called fbgen.  To keep things simple, we will consider that CFX directly produces the stream.) CFX stores this output stream as RSF triples in a file called factbase.rsf. 

 
funcdef main.c main
defloc main main.c:3
include main.c putint.h
linkcall main putint
linkref main global_var
macrodef *Initialization* __GCC_NEW_VARARGS__
macrodef *Initialization* __GNUC_MINOR__
macrodef *Initialization* __GNUC__
macrodef *Initialization* __sparc
macrodef *Initialization* __sparc__
macrodef *Initialization* __sun
macrodef *Initialization* __sun__
macrodef *Initialization* __unix
macrodef *Initialization* __unix__
macrodef *Initialization* sparc
macrodef *Initialization* sun
macrodef *Initialization* unix
funcdcl putint.h putint
dclloc putint putint.h:5
vardcl putint.h global_var
vardclloc global_var putint.h:4
macrodef putint.h PUTINT_H
funcdef putint.c putint
defloc putint putint.c:4
vardef putint.c global_var
vardefloc global_var putint.c:3
include putint.c putint.h
include putint.c putint.h

librarycall putint printf

 

Figure 5.2. The EG Program, as Translated by CFX to RSF Triples

 

As a reader, you do not need to concern yourself with the details of this stream.  Our purpose in giving this stream is to illustrate the kinds of streams that Grok deals with.  In this stream, the triple

   funcdcl F p

means file F contains a function declaration (function header) for p.  The triple

     funcdef F p

means file F contains a function definition (function body) for p.  
 

CFX Schema

 
Figure 5.3 gives the list of the relations produced by CFX, in the form of a TA schema, with brief explanations of each relation.  (Aside: in the last part of the schema, the string file:lineNum has been placed in quotes.  This is done because the Grok interpreter puts quotes around strings containing colons.)

 

 

SCHEME       TUPLE     :

 

funcdef      file      function        // function is defined in file

funcdcl      file      function        // function declared in file

funcdcl      file      libraryfunction // libraryfunction declared in file

vardef       file      variable        // var defined in file

vardcl       file      variable        // var declared in file

macrodef     file      macro           // macro defined in file

typedef      file      type            // type defined in file

structdef    file      struct          // struct defined in file

uniondef     file      union           // union defined in file

enumdef      file      enum            // enum defined in file

include      file      file            // file 1 includes file 2

sourcecall   function  function        // function 1 calls function 2

                                       // ( function 2 definition is

                                       //   in same compilation unit

                                       //           OR

                                       //   is included in same compilation

                                       //   unit as function 1 )

sourceref    function  variable        // function references variable

                                       //  ( variable definition is

                                       //    in same compilation unit

                                       //           OR

                                       //    is included in same compilation

                                       //    unit as variable )

linkcall     function  function        // function 1 calls function 2

                                       //  ( function 2's definition

                                       //    must be linked to )

linkref      function  variable        // function references variable

                                       //  ( variable definition must

                                       //    be linked to )

usemacro     file      macro           // file uses macro

usetype      file      type            // file uses type

usestruct    file      struct          // file uses struct

useunion     file      union           // file uses union

useenum      file      enum            // file uses enum

librarycall  function  libraryfunction // function calls libraryfunction

                                       //  ( a libraryfunction is a func

                                       //    whose definition does not

                                       //    exist in the analysed code )

 

// Attribute tuples extracted by CFX

 

defloc       function  “file:lineNum”        // func definition location

dclloc       function  “file:lineNum”        // func declaration location

deflocend    function  “file:lineNum”        // func definition end location

vardefloc    variable  “file:lineNum”        // var definition location

vardclloc    variable  “file:lineNum”        // var declaration location

 

macrodefloc  macro     “file:lineNum”        // macro definition location

typedefloc   type      “file:lineNum”        // type definition location

enumdefloc   enum      “file:lineNum”        // enum definition location

structdefloc struct    “file:lineNum”        // struct definition location

uniondefloc  union     “file:lineNum”        // union definition location

 

enumdeflocend   enum      “file:lineNum”     // enum definition end location

structdeflocend struct    “file:lineNum”     // struct definition end location

uniondeflocend  union     “file:lineNum”     // union definition end location

 

Figure 5.3. The CFX Schema (Also Called the PBS Low Level Schema)

 
The notation for the schema given in Figure 5.3 has, almost, the form of a TA scheme (schema).  However, the form of this schema is a bit dated (CFX and this schema were created before TA was defined), in that the attributes are written as relations.  For example, this line in the schema
    defloc function     “file:lineNum”
would be expected to be written in TA as
      function { defloc = “file:lineNum” }
However, we can treat attributes as relations with no harm done, so we will not be concerned with this difference.
 
CFX Schema as an Entity/Relation Diagram
 
We can draw a schema as a diagram.  For example, the CFX schema is shown as a diagram in Figure 5.4.  This diagram has the form (almost) of an Entity/Relation Diagram; E/R Diagrams are commonly used in the design of relational databases.
 

 

Figure 5.4. The CFX Schema as a Diagram

 

Querying EG’s CFX Factbase

 

We can use the stream shown in Figure 5.2 as the basis for answering questions about the EG program.  In particular, we can use Grok to ask queries about the EG program.  To do this, we can start up the Grok interpreter (the command for Grok is grok or grok.x).  Our first command to Grok is:

   getdb “factbase.rsf”

This reads the stream into Grok’s internal memory.  (Aside: In interactive mode, Grok is forgiving of syntactic details, so when we are typing this interactively, we can omit the quotation marks and type simply getdb factbase.rsf.)

We can ask Grok to list the entire database by typing

   listdb

When we do this, Grok produces a list of triples much like that in Figure 5.2.  However, its ordering of triples may be different, because the order of relations is immaterial to Grok. 

We can determine the number of triples by typing

     dbsize

Grok will output the number 29, which is the count of the triples in the list. 

For unknown reasons, the CFX extractor sometimes produces repeated copies of triples.  In Figure 5.2, the factbase.rsf stream contains two copies of the line

     include putint.c putint.h

which appear near the bottom of the list.   We can delete duplicate lines by typing

            deldups

After we have typed deldups, when we again type dbsize, Grok will tell us that there are only 28 triples, because the one of the duplicated copies was deleted.

 

Querying about File Inclusions 

 

Perhaps we want to check to see what includes were in the EG program.  To do this, we can type

   include

This asks Grok to list all the pairs in the include relation, so it outputs:

     main.c   putint.h

     putint.c putint.h

These two pairs tell us that both main.c and putint.c include putint.h. 

We can ask the question: What files does main.c include, by typing:

     main.c . include

This form is a forward set projector.  As a result, Grok will type

     putint.h

In this example, we have cut down on the amount of we needed to type by taking advantage of Grok’s forgiving nature.  Grok’s set projectors actually expect to receive a set value.  To be perfectly correct, and to avoid Grok error messages, we should really have written:

     { “main.c” } . include

But when we are working interactively, we often use these shortcuts. 

We can determine what files include putint.h by typing:

     include . putint.h

This form is a backward set projector, and again we are minimizing the amount we need to type.  As a result, Grok will type

     main.c

     putint.c

 

Querying about Function Calls

 

We might want to know more about calls in the EG program.  We could start out by asking the names of all relations, by typing

   relnames

As a result, Grok will output the list of names of relations:

       funcdef

       defloc

       include

       linkcall

       linkref

       macrodef

       funcdcl

       dclloc

       vardcl

       vardclloc

       vardef

       vardefloc

       librarycall

From this list, we can surmise that the fact base contains two kinds of calls: linkcall and librarycall.   From the schema (Figure 5.3) we know that a linkcall is a call that is linked from one user compilation to another, while a librarycall is a call to a library function.  To inspect all linkcalls, we type

     linkcall

As a result Grok outputs

     main     putint

To find out about library calls, we type

     librarycall

and Grok outputs

     putint   printf

We have determined that the only linked function call is from main to putint, and the only call to a library is from putint to printf.

We may want to know what things putint is related to.  To find out, we type

   putint :

It is helpful to think of the colon operator (:) as two eyeballs, looking forward from putint to see what it is related to.  As a result, Grok will output:

dclloc              putint.h:5

defloc              putint.c:4

librarycall         printf

From looking at the comments in the CFX schema, we can surmise from this that putint is declared on line 5 of putint.h and defined on line 4 of putint.c.  We can also re-affirm that putint calls the library function printf.

We might want to know what depends on the putint function.  To find out we type

   : putint

It is helpful to think of this as two eyeballs looking at putint.  As a result, Grok will output:

     main     linkcall

     putint.h funcdcl

     putint.c funcdef

This means that function main calls putint.  It also means that putint is declared in putint.h and is defined in putint.c.

 

The C488 Compiler Example

 

As our next example, we will consider a larger program.  This program is a compiler, written in C, for a Pascal-like language.  It has a traditional compiler structure, with a scanner, a parser, a semantics pass and a code generator.  We have run C488 through CFX and have obtained the factbase for it.  This factbase, factbase.rsf, for C488 is much larger than the factbase for EG.  Further, we have run this factbase though the PBS pipeline and as a result, we have created a landscape (structural diagram) for it.  You may want to view it on the web by this URL:

http://swag.uwaterloo.ca/pbs/examples/C488/V1/index.html

If you navigate around through this landscape, you will discover that the C488 compiler consists of a number of files, such as parser.c, parser.h, codgen.c, codegen.h, globalvars.h, tokendef.h, etc.  You can observe that the only relations shown in the diagram are call, reference, implementby and otherdep.  As you may recall, the CFX schema has many more relations than this.  Why does the diagram have fewer and different relations?  The answer is that a number of Grok scripts manipulated the factbase, combining and renaming relations, to make the diagrams easier to comprehend.  Another thing to notice is that the diagram contains higher level entities, such as the semantics subsystem, which group together a number of files.  In particular, the semantics subsystem groups these files:  semantics.h, semantics.c  and  semanticext.h. 

When landscape diagrams are inspected, it is common to observe surprising dependencies.  For example, in the C488 top view, there is a function call from the codegen subsystem to the semantics subsystem.    Why should the code generator call the semantic analysis part?  It seems that semantic analysis should be completed before code generation starts, so such a call would be wrong or maybe just peculiar.  We want to know what actual function call(s) are causing this high level interaction from the code generator to semantics.

Since this kind of question happens repeatedly, we will give it a name: the Where-Did-This-Arrow-Come-From problem.  Ideally, we would just press a button on the landscape diagram display, and we would get the answer.  The reason there is no button is that there has been considerable abstraction and simplification from the source program to the CFX factbase to the stream used as a basis for drawing the landscape diagrams.  However, by using the factbase and Grok, we can work our way backward to answer this question.

 

The Where-Did-This-Arrow-Come-From Problem

 

We will consider the following question.

Q1: Where did this arrow from subsystem S to subsystem T come from?

As an example, we will consider the call arrow from S, the code generator subsystem, to T, the semantics subsystem.  We will answer this question in two steps.

Step 1. Using the landscape diagram,  we will partly answer the question by finding out what files in system S call what files in subsystem T.  Given that we have discovered that file F in S calls file G in T, we are now ready for the easier question:

Q2: Where did the arrow from file F to file G come from? 

Step 2.  We will answer question Q2 by using Grok to query the program’s CFX factbase.

Aside: This example uses a factbase created by the CFX fact extractor. If we were using a different fact extractor, such as CPPX, we would need to modify our approach, but the same general method would apply.

In Step 1, we can use the diagrammatic querying facilities of the landscape drawing program to file out what file(s) in the code generator call what file(s) in semantic analysis.  By concentrating first on the codegen subsystem, we can look into its internal diagram to determine that the only file in it that makes calls is codegen.c.  We can conclude that F (the file in S) is codegen.c.  Then we can look inside the semantics subsystem, where we can immediately discover that the only file that is called is semantics.c.  So we can conclude that G (the file in T) is semantics.c.  In a more complex example, we would find more than one file F that calls more than one file G.  Also, in a more complex example, we may have to spend some time diving deeper into the landscape diagrams to locate the files that are causing the interaction.  At any rate, once files F and G are isolated, using the diagrams, we are ready to go to step 2.

To carry out Step 2, we use the program’s factbase.rsf file, which was created by CFX.  This factbase contains interactions at the level of functions and global variables.  After starting up Grok, we type this command to load the factbase:

   getdb factbase.rsf

Once this is loaded, it is a good idea to check its size (using dbsize) and to take a look at the names of its relations (using relnames),  just to gain confidence that what we are inspecting makes sense.  It is also important to understand the schema reasonably well; it is recommended that you refer to the CFX schema (Figures 5.3 and 5.4).  From the schema, we can tell that the funcdef relation records when file F defines function P.  We can find the names of all functions declared in file codegen.c by typing:

     P := codegen.c . funcdef

To check that F seems reasonable, we can find out how  many functions are in it, by typing:

     # P

When we do this, Grok tells us that P contains 21 items.  Since P is not large, it is a good idea to list these items, by typing:

     P

As a result, Grok outputs all the functions defined in codegen.c:

    BBS_Pop

    BBS_Push

    Code

    Code31

    Code32

    Code46

    Code47

    FBS_Pop

    FBS_Push

    FBS_Swap

    LES_Pop

    LES_Push

    Pop_FuncName

    Pop_arrayName

    Push_FuncName

    Push_arrayName

    Return_Func_NumArgs

    allocate_storage

    codegenFinalize

    codegenInitialize

    generateCode

This list seems reasonable, i.e., it is not surprising that these functions are part of codegen.c. 

We will next find all functions in semantics.c.  In querying to find these functions, we will ask for all functions that  are either declared or defined in semantics.c, by typing:

   Q := semantics.c . (funcdef + funcdcl)

We could have written simply

   Q := semantics.c . funcdef

However, there is the possibility that semantics.c might contain a function header, without a corresponding body, and that the arrow we are trying to locate might be pointing to this header.  So we are using the first (the longer) query.  Once we have the value of Q, it is a good idea to see how big it is and to list the names of functions contained in it.  We have illustrated these operations for P, so we won’t show them for Q.

Now we are ready to give a query that asks for all linkcalls from functions in codegen.c (in P) to functions in semantics.c (in Q).  Here is the query:

id (P)  o  linkcall  o  id (Q)

This query makes use of the id function twice.  The first id produces self loops on each member of P.  These self loops are composed with linkcall so that the result includes only  those pairs in linkcall which start with a function in P.  These pairs are next composed with id (Q) so that the resulting relation contains only those pairs that end with a function in Q.  The result contains all those link calls from a function in codegen.c to a function in semantics.c.  As a result of this query, Grok outputs

    generateCode   ErrorHandler

From this we conclude that there is only one function, generateCode, in codegen.c, which calls a function in semantics.c.  The only such called function is ErrorHandler. 

To find out more about the generateCode function, we can type

     generateCode :

As a result, Grok outputs:

    defloc         codegen.c:365

    deflocend      codegen.c:869

    linkref        errorFile

    linkref        scanInteger

    linkref        traceCodeGen

    linkref        traceFile

    linkcall       ErrorHandler

    linkref        PrevScope

    linkref        ScopeLevel

    librarycall    fprintf

    librarycall    strlen

    dclloc         codegen.h:21

From this list, we can see that generateCode is declared (defloc) on line 21 of codegen.h and is defined (defloc) on line 365 of codegen.c  We could make a similar query for ErrorHandler, but instead we will specialize the query to give us the locations of ErrorHandler’s declaration and definition, by typing:

     ErrorHandler . (dclloc + defloc)

As a result of this query, Grok outputs:

    "semantics.c:138"

This output is surprising, in that it indicates that the Error Handler does not have both a definition and a declaration.  To find out more, we can limit our query to try to find only the declaration, by typing:

     ErrorHandler . defloc

Intriguingly, Grok produces no output for this query, indicating that ErrorHandler has no declaration.  We can conclude that ErrorHandler  has a definition (we could write a query to explicitly check this), but no declaration.

What we have learned is that ErrorHandler is programmed with a peculiar style, with a definition but no declaration.  While this usage is allowed by the C language, it a frowned upon as being bad style by many programmers.  This is particularly suspicious, because we have also discovered that this function is part of semantics.c, but is called from codegen.c.  To see how widely ErrorHandler  is used, we type:

   linkcall . ErrorHandler

As a result Grok outputs only

     generateCode

This tells us that ErrorHandler is called (outside of semantics.c, externally via linking) by only the generateCode function.  This is further evidence that ErrorHandler is used in a peculiar fashion, and perhaps should be moved to a new location, or perhaps that generateCode should be calling a function other than ErrorHandler. 

We will not explore this peculiarity further here.  Our goal was to show how to solve the Where-Did-This-Arrow-Come-From Problem.  To keep the explanation simple, we have used a straightforward example.  In more complex cases, it is easy to get lost in the details.  Keeping track of the queries, and avoiding mistakes when typing the queries, is tedious and can be error prone.  At the same time, it can also be rewarding, leading to discoveries of unexpected and sometimes incorrect parts of the program.  Indeed, it is intellectually interesting to be able to navigate through the structure of a piece of software.

 

Credits.  Gerry Kovan, with assistance from Ric Holt, created the CFX TA schema in Figure 5.3.  Ivan Bowman created the E/R Diagram for this schema, in Figure 5.4.


Part 4.

Writing Grok Scripts

 

The preceding section showed how the Grok interpreter can execute user command interactively.  We showed how this facility can be used to determine what functions in file codegen.c call what functions in file semantics.c.

While it is handy to execute Grok commands interpretively, it is also handy to bundle up a set of commands and run them as a script.  Figure 6.1 gives a script that solves this same Where-Did-This-Arrow-Come-From Problem.

 

 1  % Find call edges from P to Q

 2  getdb "factbase.rsf"

 3  P := {"codegen.c"} . funcdef

 4  countP := # P

 5  put "codegen.c contains ", countP, " function defs"

 6  Q := {"semantics.c"} . (funcdef + funcdcl)

 7  countQ := # Q

 8  put "semantics.c contains ", countQ, " function defs and dcls"

 9  put "Here are the linkcalls from codegen.c to semantics.c"

10  id (P)  o  linkcall  o  id (Q)

 

Figure 6.1. Grok Script comefrom.grk

 

The Grok script in Figure 6.1 contains roughly the same commands that were given in the previous section as interactive command.  (The line numbers in the figure are only for reference purposes, and so not actually appear in the script.)  In the figure, we have inserted a comment in line 1; as you can see, comments start with a percent % character. 

In line 2, we added quotation remarks around factbase.rsf.  In lines 3 and 5, we added quotation marks as well as set brackets { … } around the file names.  These additions make the commands syntactically correct.  In the interactive commands, we avoided some typing by taking syntactic shortcuts, which are, officially speaking, illegal, but since the interpreter is forgiving, we saved ourselves a few keystrokes.

Aside: While it is convenient to take shortcuts when typing interactive commands, you should avoid these and eliminate syntax errors in a Grok script.  Grok scripts are hard enough to write without getting confused by error messages.  Besides, error messages often help isolate serious bugs in the script.

Lines 5, 8 and 9 illustrate the use of the put statement.  It accepts sequences of items separated by commas.  These items can be strings, as well as names of variables.

We can run this script from the command line by typing

   grok  comefrom.grk

When we do this, we will get this output:

   codegen.c contains 21 file defs

   semantics.c contains 17 file defs and dcls

   Here are the linkcalls from codegen.c to semantics.c

     generateCode   ErrorHandler

There is another way to run this script, namely, we can give an interactive command to invoke it.  To do this, we can first start up the Grok interpreter by typing

     grok

Once Grok starts up, we can give this command:

     exec comefrom.grk

This runs the script and produces the same output as has just been shown.  When the script finishes running, Grok will still in interactive mode, so we can continue giving it commands.  For example, we could type this simple command:

     P

This command lists the items contained in set P (these are the functions defined in codegen.c).  More generally, using the exec command in interactive mode is helpful because it allows us to inspect the values of relations and variables that have been computed by the script.  The exec command can also be used in a script, to all another script as a subroutine.  For example, the command exec comefrom.grk  could appear in a script. 

 

Using Arguments in a Script

 

Our script solves one particular Where-Did-This-Arrow-Come-From Problem.  However, it works only for two particular files, codegen.c and semantics.c and only for a factbase called factbase.rsf.  We can generalize it to handle any files and any factbase by using arguments, as is illustrated in Figure 6.2.

 

 1  % Find call edges from P to Q

 2  if $n ~= 3 then  % Are there 3 arguments?

 3     put "Usage:  grok comefrom.grk factbaseRsf fromFileName toFileName"

 4     put "Finds linkcalls from fromFileName to toFileNamec "

 5     put "Author: Ric Holt,   Date: 22 Apr 2002"

 6     quit

 7  end if

 8 

 9  factbaseRsf := $1

10  fromFileName := $2

11  toFileName := $3

12 

13  getdb factbaseRsf

14  P := { fromFileName } . funcdef

15  countP := # P

16  put fromFileName, " contains ", countP, " function defs"

17  Q := { toFileName } . (funcdef + funcdcl)

18  countQ := # Q

19  put fromFileName, " contains ", countQ, " function defs and dcls"

20  put "Here are the linkcalls from ", fromFileName, " to ", toFileName

21  id (P)  o  linkcall  o  id (Q)

 

Figure 6.2. Grok Script comefrom.grk Rewritten with Arguments

 

In the Grok script, the notation $n (line 2) gives the number of arguments, and $1, $2, $3, etc. (lines 9 –11) give the values of arguments 1, 2, 3, etc.  Grok’s form of if statement is illustrated by lines 2 to 7.  The quit statement (line 6) aborts the script.

The script can be run by giving the command

     grok comefrom.grk factbase.rsf codegen.c semantics.c

or in interactive mode by typing

     exec comefrom.grk factbase.rsf codegen.c semantics.c

Other file names and another name of a factbase can be given as arguments. 

If a script is used often, it is good to create a shell script that invokes it, in order to simplify use of the script.  For example, we might create a shell script named comefrom that consists of this line:

   grok comefrom.grk $@

The form $@ matches all the arguments passed to Grok. This shell script allows us to use comefrom by just typing comefrom factbase.rsf codegen.c semantics.c.    For this to be generally useful, the shell script should provide an absolute path (or a path determined by a shell variable) so that comefrom.grk can be found from any directory.

 

Invoking a Script from a Library

 

This exec command can be used in another Grok script.  When this is done, care must be taken to be sure that the script, comefrom.grk in our example, can be located by Grok.  By default, Grok will expect to find the script in the local directory.  The SwagKit project, at the University of Waterloo, has a library of commonly used Grok scripts.  These are stored in a directory, which is located by a shell variable called SWAGKIT_PATH.  Figure 6.3 gives an example of using this shell variable.

 

execPath := getenv "SWAGKIT_PATH"

linkProcedure := execPath cat "/lib/pipeline/" cat "lift.grp"

exec linkProcedure

 

Figure 6.3.  Using a Shell Variable to Locate a Grok Script

 

In the figure, the Grok built-in function getenv fetches the value of the SWAGKIT_PATH shell variable.  This is assigned to string variable execPath.  The complete path to the Grok script is built up by catenating execPath, /lib/pipeline/ and lift.grp.  If SWAGKIT_PATH has the value /home/holt/dev_swagkit (as it currently does for the author of this note), the complete path for the Grok script will be

     /home/holt/dev_swagkit/lib/pipeline/lift.grp

(Note: The suffix .grp is used by convention to label a Grok procedure.)  If you build up a library of Grok scripts, you can use a shell variable to locate your scripts, as is done for SwagKit.

 

Abstractions by Grok Scripts

 

A common use of Grok scripts is to read a factbase and to simplify or abstract it.  One way of doing this is to combine sets of similar relations into a single relation.  Figure 6.4 gives a script, named cfxadddep.grk, which combines all dependencies in a CFX factbase into a single relation called dep.  The factbase, with dep added to it, is written out to a new factbase.

 

 1  % Add dependency 'dep' to CFX fact base.

 2  if $n ~= 2 then  % Are there 2 arguments?

 3      put "Usage:  grok cfxadddep.grk factbaseRsf newFactBaseRsf"

 4      put "Combines all dependencies into a single dependency 'dep'"

 5      put "Author: Ric Holt,   Date: 22 Apr 2002"

 6      quit

 7  end if

 8 

 9  factbaseRsf := $1

10  newFactbaseRsf := $2

11 

12  cfxDep := { "include", "sourcecall", "sourceref", "linkcall", \

13     "linkref", "usemacro", "usetype", "usestruct", "useunion", \

14     "useenum", "librarycall" }

15 

16  for d in cfxDep

17      $ d := EMPTYREL

18  end for

19 

20  getdb factbaseRsf

21 

22  dep := EMPTYREL

23  for d in cfxDep

24      dep := dep + $ d

25  end for

26 

27  putdb newFactbaseRsf

 

Figure 6.4.  Combining Dependency Relations

 

Lines 12-14 of the script create the set cfxDep, which contains the names of all dependencies in the CFX schema (see Figure 5.3 and 5.4).  Note that a backslash \ has been used at the ends of lines 12 and 13 to extend the command to multiple lines.

Lines 16-18 and 23-25 are examples of Grok for statements.  These iterate across elements of a set of strings.  In each of these, the loop is executes with d assigned each of the values contained in set cfxDep.  We know that the iterations will assign d each value from the set, but we are not guaranteed what the ordering will be.

Line 17 contains the statement

   $ d := EMPTYREL

The dollar sign $ means that the value of d is to be treated as a relation or variable.  For example, when d has the value sourcecall, this statement will effectively be

   sourcecall := EMPTYREL

The reason that we have assigned the value EMPTYREL is a bit subtle.  This assignment acts like a “declaration” of the relation.  This “declaration” needs to precede the statement getdb factbaseRsf, so that, if the factbase does not contain any pairs in the relation, then the relation still exists (is declared), even though it is empty.  Without this “declaration”, if there were no pairs in the relation, the relation would not exist, as far as the Grok interpreter is concerned.  This is in turn would cause problems in line 24, in which the dep relation is constructed from all the dependency relations.  If d has a value (such as sourcecall) in line 24 which isn’t the name of a relation, the Grok interpreter will not know how to execute the line.  To avoid this difficulty, any relation being read from an RSF or TA stream, which might not have any pairs in the stream, should be assigned EMPTYREL before reading the stream.

Combining dependencies into “higher level” dependencies is a common way to abstract information in a graph.  Another way is to eliminate low-level entities, such as statements, or even functions within files.  When this is done, the dependencies from the low level entities are usually lifted (also called raised) to the level of their parent nodes.  Grok scripts to do this lifting are part of the PBS and SwagKit pipelines that prepare graphs for visualization.

 

Other Grok Features

 

We have introduced key features of Grok by means of examples.  We will finish by mentioning a couple more features. 

It is common to need to delete some relations or some sets of nodes from a factbase.  For example, in the script in Figure 6.4, after adding the dep relation we might want to delete all the relations, which were combined into dep.   This can be done as follows

for d in cfxDep

     delrel $ d

end for

This loop uses the delrel command to delete relations from the factbase.  To remove a set of nodes from the factbase, whose names are in a set nodesToDelete, we can use this statement

   delset nodesToDelete

This command removes all triples in the factbase that reference nodes in nodesToDelete.

When debugging a Grok script, the following command can be very helpful:

   option echo

This tells the Grok interpreter to trace the script, outputting each command as it is executed.  The command

     option timing

outputs the time taken to execute each Grok command.

We will now give a final observation about debugging Grok scripts.  Grok does not have a compiler, so there is no direct way to see if your script is syntactically legal.  However, the following trick works reasonably well: run the script with dummy values for the arguments.  This makes Grok run the script (with some bogus messages about bad arguments), but more importantly, gives messages about other syntax errors.  You should fix such errors before testing your script on real data.

There are a number of other commands, which are summarized in the following section.  This concludes our introduction to Grok.

 

 

 

 

 

 

 

 

 


Part 5.

Summary of Grok Features

 

1. Grok input/output commands

 

Grok input commands:

   getdb F              Read RSF factbase from file F

   adddb F              Add new tuples to data base from file F

   getta F              Read TA factbase from file F

   addta F              Add more TA to date base from file F

   getset s [F [D]]     Read  set s [from file F [in dir D]]

 

Grok output commands:

   putdb F              Write RSF data base to file F

   relToFile R F        Write rel R to file F

   appendRelToFile R F  Append rel R to file F

   putta F              Write TA factbase to file F

   putset s [F [D]]     Write set s [to file F [in dir D]]

   put msg {, msg}      Print msgs; each is a string or a variable name

 

The file format for relations for get/put is ASCII triples, eg,

      R a b   where R relates a to b (this format is called 'RSF').

The file format for sets is blank-separated strings.

Beware that the read/write (compressed) format isn't portable.

 

When TA is read in, attributes are treated as relations (with @_

added as a prefix).  The relations and attributes at the scheme level

have $_ prepended to their names.

 

2. Grok relation operators:

   R1 + R2              Union of relations

   R1 - R2              Difference of relations

   R1 ^ R2              Intersection of relations

   R1 o R2              Relational composition [was previously '*']

   inv R                Inverse of R

   s . R                Project set s through rel R [was *]

   R . s                Project set s through rel R backwards [was *]

   R+                   Transitive closure of R

   R*                   Reflective transitive closure of R

   R1 cmp R2            Compare: cmp is == [or =], ~=, <, <=, >, >=

   EMPTYREL             The empty relation

   # R                  Cardinality (number of tuples) of R

   dom R                Domain of R (a set)

   rng R                Range of R (a set)

   ent R                Range + domain of R (all entities in R)

   id s                 Identity relation on set s

   DOM                  Domain of all relations in factbase (a set)

   RNG                  Range of all relations in factbase (a set)

   ENT                  Range + domain of all relations (a set)

   ID                   The identity relation (based on ENT)

   head R               Head of R, eg (x y) in R produces ((x R y) head y)

   tail R               Tail of R, eg (x y) in R produces (x tail (x R y))

   rel2tripleset R      Make R into a set, eg (x y) produces (x R y)

   tripl2relname        Given '(x R y)' produce 'R'

 

 

3. Other relation commands:

   relnames             Set of the names of relations

   delrel R             Remove R's tuples factbase

   listdb               Print triples in factbase

   sample R [b [n] ]    Print sample from R.  Begin with

                          number b (default 1) & print n (10)

   slice s              Remove tuples that reference other than s

   wslice s             Remove tuples that do not reference s

   delset s             Remove tuples from factbase that reference s

   delsetset s t        Remove tuples with source from s and target from t

   deldups              Delete any duplicate tuples in factbase

   dbsize               Number of tuples in factbase

   cleardb              Delete all relations

 

4. Grok set operators:

   s1 + s2              Union of sets

   s1 - s2              Difference of sets

   s1 ^ s2              Intersection of sets

   s1 cmp s2            Compare: cmp is == [or =], ~=, <, <=, >, >=

   # s                  Cardinality (number of elements) of s

   s1 X s2              Cross product s1 X s2 (a relation)

   { e1, e2, ... }      Construct set containing e1, e2, ...

   EMPTYSET             The empty set.  You can use { } instead

   pick s               Pick a value (a string) from s

   prefix Q s           Set of items in s with prefix Q

   suffix s Q           Set of items in s with suffix Q

 

5. Other set commands:

   setnames             Print the names of set variables

   sample s [b [n] ]    Print sample from s.  Begin with

                          number b (default 1) & print n (10)

   clearsets            Delete all sets

   addprefix Q s        Add prefix Q to items in set s

   addsuffix s Q        Add suffix Q to items in set s

   addprefixrel Q s     Add prefix Q to rel names in set s

   addsuffixrel s Q     Add suffix Q to rel names in set s

   addprefixdb Q s      Add prefix Q to src & trg of rel names in set s

   addprefixsrc Q s     Add prefix Q to src of rel names in set s

   addsuffixsrc s Q     Add suffix Q to src of rel names in set s

   addprefixtrg Q s     Add prefix Q to trg of rel names in set s

   addsuffixtrg s Q     Add suffix Q to trg of rel names in set s

   delprefix Q s        Delete prefix Q to items in set s

   delsuffix s Q        Delete suffix Q to items in set s

   delprefixrel Q s     Delete prefix Q to rel names in set s

   delsuffixrel s Q     Delete suffix Q to rel names in set s

   delprefixsrc Q s     Delete prefix Q to src of rel names in set s

   delsuffixsrc s Q     Delete suffix Q to src of rel names in set s

   delprefixtrg Q s     Delete prefix Q to trg of rel names in set s

   delsuffixtrg s Q     Delete suffix Q to trg of rel names in set s

   duplicate s          Creates new set with elements e.dup for

                           each element e of s, with same

                           connecting relations as in s

 

6. Grok number operators:

   n1 + n2              Addition

   n1 - n2              Subtraction

   n1 * n2              Multiplication

   n1 / n2              Real division

   n1 div n2            Truncating division

   n1 mod n2            Modulo

   n1 cmp n2            Compare: cmp is == [or =], ~=, <, <=, >, >=

 

7. Grok string operators

   S1 cat S2            Catenate strings S1 and S2

   S1 cmp S2            Compare: cmp is == [or =], ~=, <, <=, >, >=

   s :                  Out edges from node s

   : s                  In edges to node s

 

8. Grok boolean operators:

   b1 and b2            Conjunction

   b1 or b2             Disjunction

   not b                Boolean negation

 

9. Levels of precedence are, from low to high (most binding):

   and

   or

   cmp                     (==, ~=, <, <=, >, >=)

   + - cat                 (additions)

   * X

   prefixOperators         (inv dom rng ent - not ~ # pick $)

   suffixOperators         (*, +, ie, transitive closure)

 

Parentheses (...) can be used to specify order of operations

 

10. Statements:

   x := expn            Assign expn to variable x (an identifier)

 $ x := expn            Assign expn to variable whose name is in x

       (The 'x = expn' form is considered to be obsolete.)

 

   if expn then        Cascaded 'if' statement, with optional

       statements      'elsif' and 'else' clauses

   {elsif expn then

       statements}

   [else

       statements]

   end if

 

   loop                Loop statement, stopped by 'exit'

       statements

   end loop

 

   for e in s          Repeat with each element e from set s

       statements           (exits are supported)

   end for

 

   exit                Exit from loop/for

   exit when expn      Exit from loop/for when expn is true

   return              Return from current Grok script

   quit                Quit (halt Grok)

 

11. Other Grok commands:

varnames                List of all relation, set and string variables

   $ str                String 'str' is taken as a variable's name

                        If value is integer i, evaluates to arg i

   $i                   Value of parameter i, i is 0, 1, 2 ...

   $n                   Number of parameters

   getenv s             Get environment value s

   sysexec s            Execute shell command s. Value is return code

   expn                 The value of expn (expression) is printed

   exec F p1 ...        Exec file F (Grok script) with parameters p1 ...

                             where t is each member of s

   randdb t r n         Generate t tuples with r relations and

                             n nodes.

   ?                    Print the help menu

   reset                Restart (clear relations and variables)

 

Use '\' in last column to extend commands across line boundaries

 

12. Setting Grok options:

     option Q    Set option Q

The options that can be set include:

     timing      Print time to complete each operation

     echo        Echo each command (useful for tracing scripts)

     debug       Interpret 'debug comand' as 'command' (default off)

     verbal      Print out extra info during commands

     progress    Show progress on large operations (verbose)

     optimize    Run fast (default)

     dumpsets    Dump all sets and set variables (sets.tu)

     dumpnames   Dump all stored names (storage.tu)

The first of these are turned off using 'no', as in 'option notiming'.

 

13. Limits on Sizes in Grok

 

Many/most data sizes are dynamically increased up to the maximum available memory.  The dynamically adjusted sizes are for:

       Triples

       Names

          (Along with internal tables using for sorting, etc.)

Sizes which are statically fixed are:

       Maximum length of names; strSize=255 bytes, can increase to 255 at most

       Maximum tokens on a source Grok line; maxTokens (can be increased)