Professor of Computer Science at McGill University.
Leader of the Soot, Sable, and McCAT projects for compiler research.
Co-lead of the abc - AspectBench Compiler Project, a compiler for AspectJ.
Research interests are analyzing and optimizing Java bytecode, and
pointer analysis. Newest research is McLAB, a toolkit for languages and
compilers for scientific computing.
My research started out at Cornell University where my Ph.D. thesis
focused on a new static analysis that could approximate the relationships
between pointers to heap-allocated objects. The main objective
was to find data structures that were tree-like, and to
automatically use this information to find and exploit parallelism.
This analysis was interesting because it worked even for destructive
programs like list reversal, and it also handled recursion
and mutual recursion so that programs such as bitonic sort could be
successfully analyzed and parallelized.
The best references for this work are my
thesis, and a
paper summarizing
the work that appears in the first issue of IEEE Transactions of Parallel
and Distributed Systems (Volume 1, Number 1, January 1990).
Since starting at McGill in 1990, I have broadened my research to include
a variety of compiler analyses, but still with an emphasis on pointer
analysis. One of my main goals it to demonstrate that
our compilation approach works for: (1) real programming languages such as C, Java and AspectJ;
(2) diverse applications including computational fluid dynamics,
symbolic computation and graphics applications;
and (3) commonly-used architectures.
In order to provide
experimental frameworks that meet these requirements, I
have led the development of two compiler frameworks, the
McCAT optimizing/parallelizing compiler and Soot, a
framework for analysis and optimization of Java bytecode.
Our active work with McCAT is finished, and currently our main
focus is the Soot
framework which is one of the projects
done by the Sable Research Group.
The Soot Project - A framework for analyzing and optimizing Java bytecode
Since 1998 my research group has started to focus on analyzing and optmizing
Java Bytecode. I have a very active group of students who share the goal
of developing a software system that can be freely available for others
to use and build upon.
This work is being done by the Sable.
Research Group at McGill. The current status of our work, group members and
available software are all avaliable via our group's webpage,
www.sable.mcgill.ca
.
A brief summary of the work to date is:
- The development of the
Soot framework, which provides a Java API for analyzing and
transforming Java bytecode. Soot supports three intermediate representations,
Baf which is a lean internal representation of bytecode, Jimple
which is a three-address, stack-less, representation of bytecode and has
been designed to faciliate analysis and optimization, and Grimp which
is like Jimple, but has aggregrated expressions and is useful for
decompilation.
The best overall reference for the framework is the
CC 2000 paper
and
Raja Vallee-Rai's M.Sc. thesis. If you use our framework,
please cite these two sources.
More recently we have added the ability to use classfile attributes to
convey the results of compiler analyses to virtual machines, and we have
shown how to use this framework to convey attributes for eliminating
array bounds checks to the Kaffe VM and the IBM High Performance Compiler
for Java (HPJC). This is available as a
technical report,
and will also appear in CC 2001.
We have also used our framework to develop a new practical virtual method
resolution technique which was presented at
OOPSLA 00.
- The organization of Ashes,
a set of interesting Java benchmarks.
- The development of SableCC
, an object-oriented compiler compiler, written in Java and producing
Java code.
- The development of the SableVM
Java Virtual Machine, which is based on a portable threaded interpreter.
We hope that other groups will use our software, and will contribute new
pieces back to us so that all compiler groups can benefit from the work.
McCAT An optimizing/parallelizing C compiler
Our first big project, which spanned 1992-1999,
was the McCAT
optimizing/parallelizing C compiler.
This project provided my research group with a concrete framework with
which we developed and demonstrated a wide variety of new compilation
strategies on significant benchmark programs. Our primary focus was
on pointer analyses including our points-to analysis and heap analyses,
although we found many other interesting problems to solve along the way.
The McCAT Framework
The McCAT compiler was very loosely built on top of gcc
.
However, all
that remains of gcc
is part of the front-end,
and most of the intermediate
representations and back-end is specific to the McCAT compiler.
From the
beginning we decided that we should support structured AST-like
representations, including SIMPLE, our medium-level represenation on which
we did most of our analyses and LAST, our low-level representation we
used for register allocation, instruction scheduling, and code generation.
Bhama Sridharan was the main force behind SIMPLE, and Chris Donawa was
the main developer of LAST.
We first introduced this framework at
LCPC 1992, and also have a longer
version of this paper as
ACAPS Technical Memo 46.
In order to represent C programs as a compositional AST we found that we
had to restructure some programs including eliminating gotos. This was
the topic if Ana Erosa's M.Sc. thesis,
and a paper describing our
goto-elimination transformations appeared in
ICCL 1994.
Points-to Analysis
Our points-to analysis was designed to be both context and flow
sensitive so that it could achieve
a very accurate approximation for stack-based
pointers.
The best references for the points-to analysis is Maryam Emami's
M.Sc. thesis , and our
1994 PLDI paper.
Connection and Shape Analysis
The heap analysis work presented two practical approaches to estimating
heap-allocated data structures, connection analysis and shape analysis.
Connection analysis is a very coarse-grain analysis that can be used
to determine if two heap-directed pointers point to the same data structure.
Shape analysis is a more ambitious analysis that approximates if a
data structure is tree-like, DAG-like, or a general graph with cycles.
Tbe best references for connection analysis is our paper in
IJPP 1996,
and for shape analysis, the best reference is
POPL 1996.
Both of these analyses are given in detail in
Rakesh Ghiya's M.Sc. thesis.
Putting Pointer Analysis to Work
After designing both the stack and heap analyses, we were ready for our
ultimate goal, which was to examine what we could do with the results
of these analyses. This was the main thrust of
Rakesh Ghiya's Ph.D. thesis,
and the best references to this work are in
CC 1998,
POPL 1998 and
Rakesh's thesis itself.
Generalized Constant Propogation
Since our McCAT compiler could deal with both pointers and interprocedural
analyses, we decided to try another analysis that could build on top
of this. Generalized constant propogation is a technique for estimating
the ranges of variables, and we presented this work in
CC 1996.
Array Dependence Testing and SSA-Numbering
Every parallelizing compiler needs some sort of array dependence tester.
The first version of our array dependence tester was designed and
implemented by Justiani
(CC 1994),
and then improvements to handle
symbolic tests was added by Christopher Lapkowski as reported in his
M.Sc. thesis. However, Christopher
also found out that all students who work with me run the risk of working
with pointers, and his symbolic analysis led to the need for SSA numbering,
a technique for getting many of the nice properites of SSA form, but
also handling pointers nicely
(CC 1998).
Compiling for the EARTH Multi-threaded Architecture
As well as using our compiler technology for sequential
languages/architectures, we also applied it to the problem of
compiling for a multi-threaded architecture. For these purposes we
defined a new extension of C called EARTH-C, and we modified the McCAT
compiler to accept and analyse this new language. The EARTH language
and compiler techniques are described in
PACT 1996 and an extended version
of this paper is given in
IJPP 1997.
Compiling for a distributed memory architecture provides new challenges
for the compiler writer, and many of those challenges are made easier
if you have good pointer analysis. We pursued three main areas:
(1) automatic parallelization of C programs with pointer data structures
(CC 1998)
(2) locality analysis to expose opportunities for local memory references
(PACT 1997 and
IEEE TPDS 1999),
and (3) communication analysis to improve/reduce remote communication
(PLDI 1998 and
JPDC 1999).
Locality and communication optimization was
the topic of Yingchun Zhu's Ph.D. thesis.
Other past research
I have also contributed to a variety of other research areas that are
not directly related to the McCAT work. First, I worked with Joe Hummel
on the development of techniques for describing and checking properties
of dynamic data structures. The most relevant papers are
PLDI 1992 and
PLDI 1994.
Second, I worked with Anne Rogers on her project for
Dynamic SPMD and the Olden framework
(TOPLAS 1995). Finally, I worked
on some projects at McGill that were unrelated to McCAT, including a
very fun project on register allocation using cyclic interval graphs
(CC 92, and
Journal of Programming Languages 1993).