Waterloo Programming Languages Seminars

Previous terms: F12 S12 F11 S11 W11 F10 S10 F08 S08 W08 F07 W07 F06 S06 W06

Current term:

Date Speaker Title (click for abstract) Location
Mon. Jan. 14, 2013 at 1:30 pm Omer Tripp Detecting and Preventing Web Application Security Vulnerabilities via Program Analysis DC 1304
Thu. Jan. 24, 2013 at 2:30 pm Nomair A. Naeem Typestate-like Analysis of Multiple Interacting Objects DC 2314
Mon. Mar. 11, 2013 at 2:00 pm Nomair A. Naeem Practical Extensions to the IFDS Algorithm DC 2314
Thu. Mar. 14, 2013 at 1:30 pm Mizra Omer Beg Constraint Programming in Compiler Optimization DC 2310
Tue. Apr. 2, 2013 at 10:00 am Nomair A. Naeem Faster Alias Set Analysis Using Summaries DC 2314
Wed. Apr. 3, 2013 at 10:30 am Omer Tripp Specializing Concurrency Control via Analysis of Dynamic Data Dependencies DC 1304
Mon. Apr. 8, 2013 at 10:30 am Greg Steffan Programming FPGAs with Software via Overlay Processing Engines DC 2310
Fri. Apr. 12, 2013 at 10:00 am Karim Ali Averroes: Whole-Program Analysis without the Whole Program DC 2314

All are welcome.

Talk announcements are also posted to a public mailing list.

To volunteer to give a talk, contact Ondřej Lhoták. Talks can range from informal to formal, and they can be about any of the following:

The purpose is for us all to see what people are interested in, and motivate discussions.

Abstracts

Detecting and Preventing Web Application Security Vulnerabilities via Program Analysis

Omer Tripp
Jan. 14, 2013
Web applications are the subject of an increasing number of attacks. Most attack types involve injection of malicious content into the data processed by the Web application (e.g. HTTP parameter values, session attributes, cookies, etc). The large number of severe attacks reported in recent years on banking applications, as well as corporate and government Web sites, has created a special motivation to develop analysis tools for detecting vulnerabilities in Web applications.

In the last five years, I have been a leading member of an IBM project with this goal, collaborating with Marco Pistoia, Takaaki Tateishi, Salvatore Guarnieri, Stephen Fink, Manu Sridharan, and Julian Dolby in IBM, and with Patrick and Radhia Cousot in the academia. In this talk, I will present several research challenges that were addressed as part of this ambitious project, as well as solutions developed in response to these challenges.

Specifically, my talk will survey three main areas of research I led under the broad umbrella of web security:

1. Server-side security verification, where we have developed novel slicing and bounded-analysis techniques (PLDI'09), as well as algorithms for demand-driven analysis with incremental scanning capabilities (to appear in FASE'13; joint work with Prof. Patrick Cousot and Dr. Radhia Cousot);

2. Client-side security verification, where we introduced novel and effective alias-analysis techniques for data-flow tracking in client-side JavaScript (ISSTA'11), as well as hybrid solutions enabling partial evaluation of the JavaScript program for increased precision (FSE'11), and in addition, powerful, lightweight string-analysis algorithms to eliminate false reports (under submission); and finally,

3. Dynamic testing of Web applications, where we have built a rich and unique database of security attacks, and designed learning algorithms atop this database, to enable rapid convergence on a payload that penetrates through the Web site's defenses during testing, if one exists, with high coverage and performance guarantees (under submission).

Beyond these four main topics, I will quickly go through several other related research works that I took part in, including

1. Mining and verification of server-side security defenses (ACM SIGSOFT Distinguished Paper at ISSTA'11; journal version to appear in TOSEM'13);

2. Framework modeling in static security analysis (OOPSLA'11); and

3. Agent-based testing of mobile applications (work in progress).

I will also provide a brief sketch of my main research areas and results in my PhD work, which concerns code parallelization and synchronization protocols.

Typestate-like Analysis of Multiple Interacting Objects

Nomair A. Naeem
Jan. 24, 2013
This talk will present a static analysis of typestate-like temporal specifications of groups of interacting objects, which are expressed using tracematches. Whereas typesate expresses a temporal specification of one object, a tracematch state may change due to operations on any of a set of related objects bound by the tracematch. We propose a lattice-based operational semantics equivalent to the original tracematch semantics but better suited to static analysis. We define a static analysis that computes precise local points-to sets and tracks the flow of individual objects, thereby enabling strong updates of the tracematch state. The analysis has been proved sound with respect to the semantics. A context-sensitive version of the analysis has been implemented as instances of the IFDS and IDE algorithms. The analysis was evaluated on tracematches used in earlier work and found to be very precise. Remaining imprecisions could be eliminated with more precise modeling of references from the heap and of exceptional control flow.

Practical Extensions to the IFDS Algorithm

Nomair A. Naeem
Mar. 11, 2013
In this talk, I will present four extensions to the Interprocedural Finite Distributive Subset (IFDS) algorithm that make it applicable to a wider class of analysis problems. IFDS is a dynamic programming algorithm that implements context-sensitive flow-sensitive interprocedural dataflow analysis. The first extension constructs the nodes of the supergraph on demand as the analysis requires them, eliminating the need to build a full supergraph before the analysis. The second extension provides the procedure-return flow function with additional information about the program state before the procedure was called. The third extension improves the precision with which phi instructions are modelled when analyzing a program in SSA form. The fourth extension speeds up the algorithm on domains in which some of the dataflow facts subsume each other. These extensions are often necessary when applying the IFDS algorithm to non-separable (i.e. non-bit-vector) problems. We have found them necessary for alias set analysis and multi-object typestate analysis. In this talk, we illustrate and evaluate the extensions on a simpler problem, a variation of variable type analysis.

Constraint Programming in Compiler Optimization

Mizra Omer Beg
Mar. 14, 2013
In this talk I will demonstrate that difficult combinatorial problems which arise in optimizing compilers can be practically solved using constraint programming techniques. Constraint programming is a methodology for solving hard combinatorial problems, where a problem is modeled in terms of variables, values and constraints. In this talk I will present constraint programming solutions for two important compiler optimization problems, namely, spatial and temporal scheduling and instruction selection.I will show that by using constraint programming techniques it is possible to solve these difficult optimization problems efficiently and more precisely than existing heuristic solutions.

Faster Alias Set Analysis Using Summaries

Nomair A. Naeem
Apr. 2, 2013
Alias sets are an increasingly used abstraction in situations which require flow-sensitive tracking of objects through different points in time and the ability to perform strong updates on individual objects. The interprocedural and flow-sensitive nature of these analyses often make them difficult to scale. In this talk, we present two types of method summaries (callee and caller) to improve the performance of an interprocedural flow- and context-sensitive alias set analysis. We present callee method summaries and algorithms to compute them. The computed summaries contain sufficient escape and return value information to selectively replace flow-sensitive analysis of methods without affecting analysis precision. When efficiency is a bigger concern, we also use caller method summaries which  provide conservative initial assumptions for pointer and aliasing relations at the start of a method. Using caller summaries in conjunction with callee summaries enables the alias set analysis to flow-sensitively analyze only methods containing points of interest thereby reducing running time. We present results from empirically evaluating the use of these summaries for the alias set analysis. Additionally, we also discuss precision results from a realistic client analysis for verifying temporal safety properties. The results show that although caller summaries theoretically reduce precision, empirically they do not. Furthermore, on average, using callee and caller summaries reduces the running time of the alias set analysis by 27% and 96%, respectively.

Specializing Concurrency Control via Analysis of Dynamic Data Dependencies

Omer Tripp
Apr. 3, 2013
Software applications are becoming increasingly harder to parallelize. Modern software systems written in imperative languages like Java and C# typically manipulate complex heap data structures, consist of multiple layers of abstraction, and have input- and deployment-specific behaviors that affect their available parallelism.

This creates a significant challenge for parallelizing compilers, whose target transformations are normally conditioned on the assumption that the instructions designated for parallel execution access disjoint memory regions. Establishing this precondition over real-world applications is hard: Often there are dependencies between the candidate code blocks at the concrete level, at least over some of the inputs, and even if disjointness holds universally, it is hard for an automated verifier to formulate a proof to this effect.

These limitations of static dependence analysis in dealing with the complexity of modern programs, as well as their dynamic behavior, have shifted attention to runtime optimistic parallelization techniques, and most notably, software transactional memory (STM). The challenge with this approach is that STM typically incurs high overhead -- due mainly to runtime tracking of memory accesses and conservative conflict detection -- which often obviates the performance gain from parallelization.

In this talk, I shall describe techniques for overcoming the limitations of these general parallelization techniques by specializing the parallelization system per the specific application and workload at hand. Specialization addresses fluctuating parallelism, as evidenced in many applications, and enables optimizing the runtime system with respect to prevalent program behaviors and common input profiles.

Specifically, I will present methods for specializing concurrency control by analyzing dynamic data dependencies, obtained during offline profiling over representative executions of the subject program. I will demonstrate that these methods enable effective specialization over realistic programs and popular parallelization techniques, e.g. by dramatically reducing STM overhead thanks to specialized conflict detection.

Omer Tripp is a final-year PhD student of Prof. Mooly Sagiv at Tel-Aviv University. In his PhD research, Omer applied techniques from the area of program analysis to optimize and specialize concurrency control. Omer is also a researcher at IBM, where he holds a nomination as Master Inventor. Omer's research at IBM focuses on program analysis for web and mobile security.

Programming FPGAs with Software via Overlay Processing Engines

Greg Steffan
Apr. 8, 2013
Field-Programmable Gate Arrays (FPGAs) are becoming increasingly common in computing products because they provide very low-risk access to VLSI technology and faster time-to-market compared to fully-fabricated chips. However, the use of FPGAs has faced two increasingly daunting barriers: (i) the relative difficulty of designing hardware, which is typically done at the Register-Transfer-Level (RTL) in hardware description languages by specialized hardware engineers; and (ii) the large computational effort needed by the CAD tools to translate and optimize the RTL design for an FPGA, which can take up to a day for a large design (ie., a very slow turn-around time for design iteration).

In this talk I will describe my group's recent research to address these problems via "overlay processing engines": pre-designed soft structures for FPGAs that are themselves software programmable. The simplest example of an overlay engine is a "soft processor"---i.e., an instruction set processor built using the programmable logic of an FPGA. More advanced engines we have developed include soft vector processors for embedded computing and multithreaded processors for network packet processing. In this talk I will summarize some of our more recent work on a super-fast multithreaded processor, a multithreaded VLIW engine, and work towards building multicores of these engines.

Greg Steffan is an Associate Professor and the Jeffrey Skoll Chair in Software Engineering, Department of Electrical and Computer Engineering at the University of Toronto. His research focus is on making multicore processors and FPGAs easier to program. He completed his doctorate at Carnegie Mellon University in 2003, after undergraduate and Masters degrees at the University of Toronto. He is a Siebel Scholar (2002), an IBM CAS Faculty Fellow and Visiting Scientist, and a senior member of both the IEEE and ACM.

Averroes: Whole-Program Analysis without the Whole Program

Karim Ali
Apr. 12, 2013
Call graph construction is costly and impractical because it analyzes the whole program. The separate compilation assumption (SCA) makes it possible to generate sound and reasonably precise call graphs without analyzing libraries. We investigate whether the SCA can be encoded in Java bytecode. We evaluate Averroes, a tool that generates a placeholder library that overapproximates the possible behaviour of an original library. The placeholder library can be constructed quickly and is typically smaller in size. Any existing whole-program framework can use the placeholder library to efficiently construct sound and precise application call graphs. Typical efficiency improvements are a factor of 4x to 12x in time, and 8x to 13x in memory. In addition, Averroes makes it easier to handle reflection soundly. The call graphs built with Averroes are as precise and sound as those built with Cgc, which implements the SCA explicitly in Doop.