Introduction to the Acacia System ================================= by Michael Godfrey October, 2001 (Disclaimer: I have no official relationship with the Acacia system except as a user.) Acacia ====== The Acacia system is available for a variety of Unix platforms (Linux, SunOS, Solaris, and Irix). It is freely downloadable for non-commercial use; however, you must ask for a password to download it. Commercial trial licenses are also available. Acacia consists on a command line extractor, CCia, plus a database-like query engine. This engine supports command-line and a form-based X-windows GUI queries, and produces textual and (automatically laid out) graphical output. The querying engine is called ciao, and the graphical tools are part of the GraphViz package. Both ciao and GraphViz are AT&T research projects in their own right. The CCia extractor is basically the work of one person, Emden Gansner. CCia is based around the EDG front end for C++, which is a commercial product from a third party. For this reason, the availability of the Acacia toolsuite is somewhat restricted and its source code had not been released (AT&T has released the source code for some of its other research tools). Support is available if Emden likes you :-) It is not being actively developed, although bugfix releases are made periodically. CCia ==== CCia works like a command-line compiler, supporting "separate compilation" and "linking" or "all-at-once" extractions. CCia is not a robust parser. -- All required "include" files must be available to it. -- Code must be compilable. -- Embedded languages (SQL, asm, etc) are not supported. The following standard compiler flags are supported: "-D" and "-U" for macros and "-I" for include files. A typical all-at-once extraction looks like this: % ksh CCia [-D ] [-U ] [-I ] *.cpp Note that calls to CCia and the command-line query tools need to be wrapped around a call to ksh (Korn shell; Korn is another AT&T researcher) as they take often advantage of flexibility that ksh offers. Before using CCia, one must specify (via the CC environment variable) the name of the compiler that would normally be used to compile the code. This compiler must be available at run time to CCia. On the first invocation of CCia with a new compiler, CCia probes the compiler to determine various special settings, such as predefined macros and the location of system "include" files. CCia supports both C and C++. The target language is inferred by (the name of) the compiler. For example, code that is legal C but not legal C++ should be handled by setting the CC environment variable to, say, gcc instead of g++. The result of a successful extraction is a set of .A files (one for each .c/.cpp/.C/.cc file) plus two special files: entity.db and relationship.db. These are effectively plain text databases, but their contents need not be examined directly; instead, one uses the provided querying tools. Extracted information ===================== CCia extracts information about the major entities of a software system and the relationships between them. Entities include global (extern or static) standalone variables and functions, files, macros, types (including classes, enums, structs, and unions), and subparts of types (class methods, member variables, enum values, struct subparts). Relationships include "references" (variable references, function calls, parameter types), class inheritance, file inclusion, type containment, friendship, and template instantiation. That is, CCia extracts information of entities at the "external declaration" level. Local variables and parameters are not modelled (tho references to types are recorded) and AST-level relationships are not modelled. Command line queries may be made using a variety of tools, but because they are not well documented I have used the older deprecated commands "cdef" (for entities queries) and "cref" (for relationship queries). CCia models information about the software system specified as well as any macros predefined by the compiler or command line and any system include files. The amount of predefined system information can be large; to constrain queries only to the software system specified it is often useful to specify that the filename not begin with a slash (eg /usr/include) for system include files and not be the null string (for predefined macros). For example, to find all macros your system's code: % ksh cdef -u m - file!='' file!='/*' Entity queries ============== The format for using cdef (in the way I use it anyway) is this: % ksh cdef -u [attr=val ...] The "-u" flag means "give complete but unformatted output". There are 17 columns of semi-colon delimited output for entity queries; the schema is described in the appendix below. The must be one of: f(ile), fu(unction), m(acro), v(ariable), or t(ype). The is the name of the entity. '-' can be used as a "don't care" value and regular expression matching can be used also. So to find all entities that start with "_debug", one would say this: % ksh cdef -u - '_debug*' The attributes can also be constrained, once you understand the names of the attributes (see the schema below). To find all member variables of the class Player, one would say this: % ksh cdef -u v - ptype=Player The Unix utility "cut -d ';'" can be used to prune columns that aren't of interest. To print the name (col. 2) and beginning line number (col. 7) of all function declarations in main.c, do this: % ksh cdef -u fu - def=dec | cut -d ';' -f 2,7 Relationship queries ==================== The format for using cref is this: % ksh cref -u [attr=val ...] The output for this is, roughly speaking, ; ; (this is not quite right, but it's close; see the Appendix for precise details). That is, there are 42 columns of output: 18 for entity 1, 18 for entity 2, and 4 for attributes of the relationship itself. The character "1" is appended to the attribute names of entity 1, "2" for entity 2: % ksh cref -u - - - - file2=main.C To find all instances of inheritance relationships from entities in file main.C to any other classes, one would say this: % ksh cref -u - - - - tclass1=class file1=main.C rkind=inheritance Summary ======= CCia extracts information about "top-level" programming entities and their relationships. It is a tool built by and for Unix hackers; the Acacia philosophy includes judicious use of awk, cut, grep, etc to answer program understanding queries. It is fairly solid, tho it does not support all aspects of C++; partly this is because it is based on an older front end. Appendix: Query output format ============================= For each of these, the number is the column number and the token following is the official name of the attribute. For example, to print the file name and beginning line number of all method declarations of class Player, say this: % ksh cdef -u fu - def=dec ptype=Player | cut -d ';' -f 4,7 Note that some fields are unused (18) or uninteresting (id, col #1). "cdef -u" output ================ 1 id <8 digit hex uniqueID> 053d4d71; 2 name entry.h; 3 kind fi, fu, v, m, t, s function 4 file entry.h; 5 dtype ; 6 tclass enum, typdef, struct, class, union, template ; 7 bline 1; 8 hline 12; 9 eline 84; 10 def def, dec, (macro) undef def; 11 checksum 00000000; 12 pparam param type list (string) ; 13 tparam (string) ; 14 scope priv/pub/prot/extern/static/unspec ; 15 ptype name of parent type, if part of class/struct/union etc 16 spec static(s), const/enum val?(c), inline?(i), virtual(v), const inline function??(ic), inline virtual (iv) 17 signature if kind=type, how entity must be referred to by name eg struct s, enum C::t 18 selected "cref -u" output ================ 1 kind1 fi, fu, v, t, m, s file; 2 id1 053d4d71; 3 name1 entry.h; 4 kind1(?) (seems to always be the same as field 1) 5 file1 entry.h; 6 dtype1 7 tclass1 regular, enum, typdef, struct, class, union, template, '' for file ; 8 bline1 1; 9 hline1 last line of fcn header (0 is not a fcn/file) 12; 10 eline1 84; 11 def1 def, dec, undef def; 12 checksum1 00000000; 13 pparam1 fcn param type list (string) (char *, int) 14 tparam1 (string) ; 15 scope1 priv/pub/prot/extern/static/unspec ; 16 ptype1 name of parent structure (struct, class, etc) 17 spec1 static(s), const/enum val?(c), inline?(i), virtual(v), const inline function??(ic), inline virtual (iv) 18 sig1 if kind=type, how entity must be referred to by name eg struct s, enum C::t 19 ?? (unused?) 20 kind2 fi, fu, v, t, m, s file; 21 id2 053d4d71; 22 name2 entry.h; 23 kind2(?) (seems to always be the same as field 20) 24 file2 entry.h; 25 dtype2 26 tclass2 regular, enum, typdef, struct, class, union, template, '' for file ; 27 bline2 1; 28 hline2 last line of fcn header (0 if not a fcn/file) 12; often non-zero for files; not sure what this means. 29 eline2 84; 30 def2 def, dec, undef def; 31 checksum2 00000000; 32 pparam2 fcn param type list (string) (char *, int) 33 tparam2 (string) ; 34 scope2 priv/pub/prot/extern/static/unspec ; 35 ptype2 name of parent structure (struct, class, etc) 36 spec2 static(s), const/enum val?(c), inline?(i), virtual(v), const inline function??(ic), inline virtual (iv) 37 sig2 if kind=type, how entity must be referred to by name eg struct s, enum C::t 38 ?? (unused?) 39 usage relation line nums 22.43.46; 40 rkind reference, inheritance, accadj(??), containment, friendship, typedef, instantiation; 41 ?? ?? ; 42 pkind private or protected if that kind of inheritance used (public not mentioned)