Professor of Computer Science at McGill University. Leader of the Soot, Sable, and McCAT projects for compiler research. Co-lead of the abc - AspectBench Compiler Project, a compiler for AspectJ. Research interests are analyzing and optimizing Java bytecode, and pointer analysis. Newest research is McLAB, a toolkit for languages and compilers for scientific computing. My research started out at Cornell University where my Ph.D. thesis focused on a new static analysis that could approximate the relationships between pointers to heap-allocated objects. The main objective was to find data structures that were tree-like, and to automatically use this information to find and exploit parallelism. This analysis was interesting because it worked even for destructive programs like list reversal, and it also handled recursion and mutual recursion so that programs such as bitonic sort could be successfully analyzed and parallelized. The best references for this work are my thesis, and a paper summarizing the work that appears in the first issue of IEEE Transactions of Parallel and Distributed Systems (Volume 1, Number 1, January 1990).

Since starting at McGill in 1990, I have broadened my research to include a variety of compiler analyses, but still with an emphasis on pointer analysis. One of my main goals it to demonstrate that our compilation approach works for: (1) real programming languages such as C, Java and AspectJ; (2) diverse applications including computational fluid dynamics, symbolic computation and graphics applications; and (3) commonly-used architectures.

In order to provide experimental frameworks that meet these requirements, I have led the development of two compiler frameworks, the McCAT optimizing/parallelizing compiler and Soot, a framework for analysis and optimization of Java bytecode. Our active work with McCAT is finished, and currently our main focus is the Soot framework which is one of the projects done by the Sable Research Group.

The Soot Project - A framework for analyzing and optimizing Java bytecode

Since 1998 my research group has started to focus on analyzing and optmizing Java Bytecode. I have a very active group of students who share the goal of developing a software system that can be freely available for others to use and build upon.

This work is being done by the Sable. Research Group at McGill. The current status of our work, group members and available software are all avaliable via our group's webpage, www.sable.mcgill.ca. A brief summary of the work to date is:

We hope that other groups will use our software, and will contribute new pieces back to us so that all compiler groups can benefit from the work.

McCAT An optimizing/parallelizing C compiler

Our first big project, which spanned 1992-1999, was the McCAT optimizing/parallelizing C compiler. This project provided my research group with a concrete framework with which we developed and demonstrated a wide variety of new compilation strategies on significant benchmark programs. Our primary focus was on pointer analyses including our points-to analysis and heap analyses, although we found many other interesting problems to solve along the way.

The McCAT Framework

The McCAT compiler was very loosely built on top of gcc. However, all that remains of gcc is part of the front-end, and most of the intermediate representations and back-end is specific to the McCAT compiler. From the beginning we decided that we should support structured AST-like representations, including SIMPLE, our medium-level represenation on which we did most of our analyses and LAST, our low-level representation we used for register allocation, instruction scheduling, and code generation. Bhama Sridharan was the main force behind SIMPLE, and Chris Donawa was the main developer of LAST. We first introduced this framework at LCPC 1992, and also have a longer version of this paper as ACAPS Technical Memo 46.

In order to represent C programs as a compositional AST we found that we had to restructure some programs including eliminating gotos. This was the topic if Ana Erosa's M.Sc. thesis, and a paper describing our goto-elimination transformations appeared in ICCL 1994.

Points-to Analysis

Our points-to analysis was designed to be both context and flow sensitive so that it could achieve a very accurate approximation for stack-based pointers. The best references for the points-to analysis is Maryam Emami's M.Sc. thesis , and our 1994 PLDI paper.

Connection and Shape Analysis

The heap analysis work presented two practical approaches to estimating heap-allocated data structures, connection analysis and shape analysis. Connection analysis is a very coarse-grain analysis that can be used to determine if two heap-directed pointers point to the same data structure. Shape analysis is a more ambitious analysis that approximates if a data structure is tree-like, DAG-like, or a general graph with cycles. Tbe best references for connection analysis is our paper in IJPP 1996, and for shape analysis, the best reference is POPL 1996. Both of these analyses are given in detail in Rakesh Ghiya's M.Sc. thesis.

Putting Pointer Analysis to Work

After designing both the stack and heap analyses, we were ready for our ultimate goal, which was to examine what we could do with the results of these analyses. This was the main thrust of Rakesh Ghiya's Ph.D. thesis, and the best references to this work are in CC 1998, POPL 1998 and Rakesh's thesis itself.

Generalized Constant Propogation

Since our McCAT compiler could deal with both pointers and interprocedural analyses, we decided to try another analysis that could build on top of this. Generalized constant propogation is a technique for estimating the ranges of variables, and we presented this work in CC 1996.

Array Dependence Testing and SSA-Numbering

Every parallelizing compiler needs some sort of array dependence tester. The first version of our array dependence tester was designed and implemented by Justiani (CC 1994), and then improvements to handle symbolic tests was added by Christopher Lapkowski as reported in his M.Sc. thesis. However, Christopher also found out that all students who work with me run the risk of working with pointers, and his symbolic analysis led to the need for SSA numbering, a technique for getting many of the nice properites of SSA form, but also handling pointers nicely (CC 1998).

Compiling for the EARTH Multi-threaded Architecture

As well as using our compiler technology for sequential languages/architectures, we also applied it to the problem of compiling for a multi-threaded architecture. For these purposes we defined a new extension of C called EARTH-C, and we modified the McCAT compiler to accept and analyse this new language. The EARTH language and compiler techniques are described in PACT 1996 and an extended version of this paper is given in IJPP 1997.

Compiling for a distributed memory architecture provides new challenges for the compiler writer, and many of those challenges are made easier if you have good pointer analysis. We pursued three main areas: (1) automatic parallelization of C programs with pointer data structures (CC 1998) (2) locality analysis to expose opportunities for local memory references (PACT 1997 and IEEE TPDS 1999), and (3) communication analysis to improve/reduce remote communication (PLDI 1998 and JPDC 1999). Locality and communication optimization was the topic of Yingchun Zhu's Ph.D. thesis.

Other past research

I have also contributed to a variety of other research areas that are not directly related to the McCAT work. First, I worked with Joe Hummel on the development of techniques for describing and checking properties of dynamic data structures. The most relevant papers are PLDI 1992 and PLDI 1994. Second, I worked with Anne Rogers on her project for Dynamic SPMD and the Olden framework (TOPLAS 1995). Finally, I worked on some projects at McGill that were unrelated to McCAT, including a very fun project on register allocation using cyclic interval graphs (CC 92, and Journal of Programming Languages 1993).