Call Graphs for Scala
Overview

As Scala gains popularity, there is growing interest in programming tools for it. Such tools often require call graphs. Applying existing call graph construction algorithms to the JVM bytecodes generated by the Scala compiler produces imprecise results due to type information being lost during compilation. Therefore, we propose adapting existing call graph construction algorithms, Name-Based Resolution (RA) and Rapid Type Analysis (RTA), for Scala. We have implemented 5 algorithms: RA, TCAnames, TCAbounds, TCAexpand, and TCAexpand-this. The algorithms are implemented as a Scala compiler plugin. Our experimental evaluation shows that handling of complex Scala constructs, e.g., traits and abstract type members, greatly improves the precision of the constructed call graphs. For more details on the algorithms, and a formal proof of soundness, please refer to our ECOOP'14 paper.

Downloads

  • scalabench.tar.gz: a self contained tarball that include all the necessary scripts, runnable JARs, required to replicate our experiments.
  • scalacg.tar.gz: the source code of our Scala compiler plugin. This tarball contains the source code of the implementation of all our algorithms, adn RTAwala. Additionally, it comes with two test suites: one for TCA-based algorithms, and another one for RA. The source code can be imported as an Eclipse Scala project (Note: you need the Scala-IDE plugin for Eclipse to properly identify this Scala project).
  • callgraph-plugin.jar: the Scala compiler plugin that contains the implementation of our algorithms. Additionally, it contains the ProBe tool that we use to compare and output call graphs.
  • walacg.jar: a runnable JAR for the implementation of RTAwala.
  • scalavm.ova: a VirtualBox appliance that is set up to replicate our experiments. It has scalabench.tar.gz available on the desktop. You do not need to worry about any prerequisites if you are using this virtual machine. Admin username: aec, password: ecoop14.

Prerequisites

These are the prerequisites for using our plugin with the Scala compiler.

  • Scala-2.10.2 If another Scala version is used, the plugin will crash.
  • Set the environment variable $SCALA_HOME (%SCALA_HOME% on Windows) to the location where you extract the Scala-2.10.2 tarball.
  • Set the environment variable $PATH to $PATH:$SCALA_HOME/bin (%PATH%;%SCALA_HOME%\bin on Windows).
  • Set the environment variable $JAVA_OPTS (%JAVA_OPTS% in Windows) to "-Xmx2g -XX:PermSize=512m -XX:MaxPermSize=512m". This will give a maximum of 2G of RAM for the plugin to run. This should be enough for small programs, test cases. Increase the value of -Xmx if Java runs out of memory. It is worth noting here that the provided virtual machine requires 16G of RAM, and has the Java option -Xmx set to 12G.
Additional prerequisites should be satisfied if you would like to use scalabench.tar.gz to replicate our experiments. You do not have to worry about these if you are using the scalavm.ova virtual machine provided above.
  • Bash: the scripts use the Bash unix shell to execute.
  • Ant: please refer to the official installation instructions to install ant on your machine.
  • Set the environment variable $ANT_OPTS to $JAVA_OPTS. If Java runs out of memory, adjust the environment variable $JAVA_OPTS accordingly.
  • walacg.jar includes a file named wala.properties that provides WALA with the location of the Java runtime directory. It's set by default to /usr/lib/jvm/default-java/jre/lib. If the Java runtime directory on your machine is different from this, then you need to update this file with the correct path. You can either, edit the file inside the walacg.jar archive directly, or create an empty file with the name wala.properties at the same directory of walacg.jar. Then, set the property "java_runtime_dir" in it to the Java runtime directory on your system. You then need to run the following command to update the properties file inside walacg.jar.
    $ jar uf walacg.jar wala.properties
  • LaTeX: this is only required if you would like to get a PDF output of our experimental results . The scripts used to replicate our experiments generate the output data in CSV format as well (tab separated). If you use LaTeX, please make sure you have the following LaTeX packages installed: authblk, fullpage, and multirow.

Usage

You can use our call-graph plugin to compile Scala code as follows, where <analysis> can be any value of: ra, tca-names, tca-bounds, tca-expand, or tca-expand-this, and defaults to tca-expand-this.

$ scalac -cp callgraph-plugin.jar -Xplugin:callgraph-plugin.jar [-P:callgraph:<analysis>] <code.scala>

The plugin will output the call graph in a file named callgraph.txt.gzip, along with a summary call graph (i.e., ignoring the edges within the library dependencies) in a file with the name callgraph-summary.txt.gzip. Both files can be viewed and queried by the ProBe tool included in callgraph-plugin.jar. To print out the edges of the call graph, run the following command:

$ java -cp callgraph-plugin.jar probe.CallGraphInfo callgraph.txt.gzip
For more information on how to use ProBe, please refer to this page.

Replicating our experiments

You can either download the virtual machine, or download the tarball scalabench.tar.gz (both in the downloads section above) and then satisfy all the prerequisites. The following tutorial does not depend on which path you decide to take.

You can run all analyses for one of our benchmark programs (see below) by executing the following commands:

$ cd <scalabench.tar.gz_extracted_location>
$ ./run <benchmark_name>
This will run each of our algorithms, RTAwala, and the regular Scala compiler on the given benchmark. The output of the plugin will be organized in sub-folders of scalabench, following the pattern: scalabench/dist/<analysis_name>/<benchmark_name>.

If you would like to run all analyses for all benchmark programs at once, you can use the scripts provided in scalabench.tar.gz as follows:

$ cd <scalabench.tar.gz_extracted_location>
$ ./run-all
If the script crashes due to memory-related issues, please set the environment variable $JAVA_OPTS with a higher value for the -Xmx option. If everything goes well, this process can take ~ 4.5 hours. So it might be a good idea to grab a cup of coffee, read a book, or attend to other items on your agenda while the scripts finishes running. The output of the script is organized in a similar fashion to analyzing just one benchmark. Next, the script checks if the generated call graphs satisfy the following sanity check:
TCAexpand-this ⊆ TCAexpand ⊆ TCAbounds ⊆ TCAnames ⊆ RA.

Finally, the script generates the data we present in our ECOOP'14 paper in both CSV format (under scalabench/csv) and LaTeX format (under scalabench/tex). If you have LaTeX installed, run the following script and the resulting PDF will be at scalabench/Results.pdf

$ cd <scalabench.tar.gz_extracted_location>
$ ./makelatex

Benchmark programs

Program Description Source

argot

a command-line argument parser for Scala

http://github.com/bmc/argot

ensime

an Emacs plugin that provides an enhanced Scala interactive mode

http://github.com/aemoncannon/ensime

fimpp

an interpreter for an imperative, dynamically-typed language

http://github.com/KarolS/fimpp

kiama

a Scala library for language processing

http://code.google.com/p/kiama

phantm

a tool that uses static analysis to detect type errors in PHP code

http://github.com/colder/phantm

scalaxb

an XML data-binding tool for Scala

http://github.com/eed3si9n/scalaxb

scalisp

a LISP interpreter written in Scala

http://github.com/Mononofu/Scalisp

see

a simple engine for evaluating arithmetic expressions

http://scee.sourceforge.net

squeryl

a Scala object-relational mapping library for SQL databases

http://github.com/max-l/Squeryl

tictactoe

an implementation of the classic "tic-tac-toe" game

http://github.com/nickknw/arbitrarily-sized-tic-tac-toe