| 
Sable Home
  Main Page 
  People 
  Projects 
  Publications 
  Software 
  Internal 
  Links 
 Publications
  Papers 
  Theses 
  Posters 
  Reports 
  Notes 
 | Sable Publications (Papers) 
 
 
 
 
  
| Improving Database Query Performance with Automatic Fusion | back |  
Best paper finalist.Authors: Hanfeng Chen and Alexander Krolik and Bettina Kemme and Clark Verbrugge and Laurie Hendren
 Date: 22-26 February 2020
 CC '20, San Diego, CA, USA
 
 Abstract View the paper (.pdf)
BibTeX entryArray-based programming languages have shown significant promise for improving
performance of column-based in-memory database systems, allowing elegant
representation of query execution plans that are also amenable to standard
compiler optimization techniques. Use of loop fusion, however, is not
straightforward, due to the complexity of built-in functions for implementing
complex database operators. In this work, we apply a compiler approach to
optimize SQL query execution plans that are expressed in an array-based
intermediate representation. We analyze this code to determine shape properties
of the data being processed, and use a subsequent optimization phase to fuse
multiple database operators into single, compound operations, reducing the need
for separate computation and storage of intermediate values. Experimental
results on a range of TPC-H queries show that our fusion technique is effective
in generating efficient code, improving query time over a baseline system.
 
  
| Numerical Computing on the Web: Benchmarking for the Future | back |  
Authors: David Herrera and Hanfeng Chen and Erick Lavoie and Laurie Hendren Date:  4-9 November 2018
 DLS '18, Boston, MA, USA
 
 Abstract View the paper (.pdf)
BibTeX entryRecent advances in execution environments for JavaScript and WebAssembly that
run on a broad range of devices, from workstations and mobile phones to IoT
devices, provide new opportunities for portable and web-based numerical
computing. Indeed, numerous numerical libraries and applications are emerging
on the web, including Tensorflow.js, JSMapReduce, and the NLG Protein Viewer.
This paper evaluates the current performance of numerical computing on the web,
including both JavaScript and WebAssembly, over a wide range of devices from
workstations to IoT devices. We developed a new benchmarking approach, which
allowed us to perform centralized benchmarking, including benchmarking on
mobile and IoT devices. Using this approach we performed four performance
studies using the Ostrich benchmark suite, a collection of numerical programs
representing the numerical dwarf categories identified by Colella. We studied
the performance evolution of JavaScript, the relative performance of
WebAssembly, the performance of server-side Node.js, and a comprehensive
performance showdown for a wide range of devices.
 
  
| HorseIR: Bringing Array Programming Languages Together with Database Query Processing | back |  
Authors: Hanfeng Chen and Joseph Vinish D'silva and Hongji Chen and Bettina Kemme and Laurie Hendren Date:  4-9 November 2018
 DLS '18, Boston, MA, USA
 
 Abstract View the paper (.pdf)
BibTeX entryRelational database management systems (RDBMS) are operationally similar to a
dynamic language processor. They take SQL queries as input, dynamically
generate an optimized execution plan, and then execute it. In recent decades,
the emergence of in-memory databases with columnar storage, which use
array-like storage structures, has shifted the focus on optimizations from the
traditional I/O bottleneck to CPU and memory. However, database research so far
has primarily focused on CPU cache optimizations. The similarity in the
computational characteristics of such database workloads and array programming
language optimizations are largely unexplored. We believe that these database
implementations can benefit from merging database optimizations with dynamic
array-based programming language approaches. Therefore, in this paper, we
propose a novel approach to optimize database query execution using a new
array-based intermediate representation, HorseIR, that resides between database
queries and compiled code. Furthermore, we provide a translator to generate
HorseIR from database execution plans and a compiler that optimizes HorseIR and
generates efficient code. We compare HorseIR with the MonetDB RDBMS, by testing
standard SQL queries, and show how our approach and compiler optimizations
improve the runtime of complex queries.
 
  
| Efficiently implementing the copy semantics of MATLAB's arrays in JavaScript | back |  
Authors: Vincent Foley-Bourgon and Laurie J. Hendren Date:  1 November 2016
 DLS '16, Amsterdam, Netherlands
 
 Abstract Compiling MATLAB --- a dynamic, array-based language --- to JavaScript is an
attractive proposal: the output code can be deployed on a platform used by
billions and can leverage the countless hours that have gone into making
JavaScript JIT engines fast. But before that can happen, the original MATLAB
code must be properly translated, making sure to bridge the semantic gaps of
the two languages.
 
An important area where MATLAB and JavaScript differ is in their handling of
arrays: for example, in MATLAB, arrays are one-indexed and writing at an index
beyond the end of an array extends it; in JavaScript, typed arrays are
zero-indexed and writing out of bounds is a no-op. A MATLAB-to-JavaScript
compiler must address these mismatches. Another salient and pervasive
difference between the two languages is the assignment of arrays to variables:
in MATLAB, this operation has value semantics, while in JavaScript is has
reference semantics.
 
In this paper, we present MatJuice --- a source-to-source, ahead-of-time
compiler back-end for MATLAB --- and how it deals efficiently with this last
issue. We present an intra-procedural data-flow analysis to track where each
array variable may point to and which variables are possibly aliased. We also
present the associated copy insertion transformation that uses the points-to
information to insert explicit copies when necessary. The resulting JavaScript
program respects the MATLAB value semantics and we show that it performs fewer
run-time copies than some alternative approaches.
View the paper (.pdf)
BibTeX entry 
  
| Exhaustive Analysis of Thread-level Speculation | back |  
Authors: Clark Verbrugge and Christopher J.F. Pickett and Alexander Krolik and Allan KielstraDate:  1 November 2016
 SEPS '16, Amsterdam, Netherlands
 
 Abstract View the paper (.pdf)
BibTeX entryThread-level Speculation (TLS) is a technique for automatic parallelization. The complexity of 
even prototype implementations, however, limits the ability to explore and compare the wide 
variety of possible design choices, and also makes understanding performance characteristics 
difficult. In this work we build a general analytical model of the method-level variant of TLS 
which we can use for determining program speedup under a wide range of TLS designs. Our 
approach is exhaustive, and using either simple brute force or more efficient dynamic 
programming implementations we are able to show how performance is strongly limited by program 
structure, as well as core choices in speculation design, irrespective of and complementary to 
the impact of data-dependencies. These results provide new, high-level insight into where and 
how thread-level speculation can and should be applied in order to produce practical speedup.
 
  
| Automatic Vectorization for MATLAB | back |  
Authors: Hanfeng Chen and Alexander Krolik and Erick Lavoie and Laurie J. Hendren Date:  28-30 September 2016
 LCPC '16, Rochester, NY, USA
 
 Abstract View the paper (.pdf)
BibTeX entryDynamic array-based languages such as MATLAB provide a wide range of built-in
operations which can be efficiently applied to all elements of an array.
Historically, MATLAB and Octave programmers have been advised to manually
transform loops to equivalent "vectorized" computations in order to maximize
performance. In this paper we present the techniques and tools to perform
automatic vectorization, including handling for loops with calls to
user-defined functions. We evaluate the technique on 9 benchmarks using two
interpreters and two JIT-based platforms and show that automatic vectorization
is extremely effective for the interpreters on most benchmarks, and moderately
effective on some benchmarks in the JIT context.
 
  
| Reducing Memory Buffering Overhead in Software Thread-level Speculation | back |  
Authors:  Zhen Cao and Clark VerbruggeDate: 17-18 March 2016
 CC '16, Barcelona, Spain
 
 Abstract View the paper (.pdf)
View the presentation slides (.pdf)
BibTeX entrySoftware-based, automatic parallelization through Thread-Level Speculation (TLS) has significant
practical potential, but also high overhead costs.  Traditional "lazy" buffering mechanisms enable strong
isolation of speculative threads, but imply large memory overheads, while more recent "eager" mechanisms improve
scalability, but are more sensitive to data dependencies and have higher rollback costs.  We here
describe an integrated system that incorporates the best of both designs, automatically selecting
the best buffering mechanism.  Our approach builds on well-optimized designs for both techniques,
and we describe specific optimizations that improve both lazy and eager buffer management as well.
We implement our design within MUTLS, a software-TLS system based on
the LLVM compiler framework.  Results show that we can get 75% geometric mean performance of OpenMP
versions on 9 memory intensive benchmarks. Application of these optimizations is thus a useful part of the
optimization stack needed for effective and practical software TLS.
 
  
| Velociraptor: a compiler toolkit for array-based languages targeting CPUs and GPUs | back |  
Authors: Rahul Garg and Laurie Hendren Date: June 15 - 17, 2015
 ARRAY@PLDI '15, Portland, OR, USA
 
 Abstract View the paper (.pdf)
BibTeX entryWe present a toolkit called Velociraptor that can be used by compiler writers
to quickly build compilers and other tools for array-based languages.
Velociraptor operates on its own unique intermediate representation (IR)
designed to support a variety of array-based languages. The toolkit also
provides some novel analysis and transformations such as region detection and
specialization, as well as a dynamic backend with CPU and GPU code generation.
We discuss the components of the toolkit and also present case-studies
illustrating the use of the toolkit.
 
  
| AspectMatlab++: annotations, types, and aspects for scientists | back |  
Authors: Andrew Bodzay and Laurie Hendren Date: March 16 - 19, 2015
 MODULARITY '15, Fort Collins, CO, USA
 
 Abstract View the paper (.pdf)
BibTeX entryIn this paper we present extensions to an aspect oriented compiler developed
for MATLAB. These extensions are intended to support important functionality
for scientists, and include pattern matching on annotations, and types of
variables, as well as new manners of exposing context. We provide use-cases of
these features in the form of several general-use aspects which focus on
solving issues that arise from use of dynamically-typed languages. We also
detail performance enhancements to the ASPECTMATLAB compiler which result in an
order of magnitude in performance gains.
 
  
| Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs | back |  
Authors:  Rahul Garg and Laurie HendrenDate: August 24-27, 2014
 PACT '14, Edmonton, AB, Canada
 
 Abstract Developing just-in-time (JIT) compilers that that allow scientific programmers
to efficiently target both CPUs and GPUs is of increasing interest. However
building such compilers requires considerable effort. We present a reusable and
embeddable compiler toolkit called Velociraptor that can be used to easily
build compilers for numerical programs targeting multicores and GPUs.
 
Velociraptor provides a new high-level IR called VRIR which has been
specifically designed for numeric computations, with rich support for arrays,
plus support for high-level parallel and GPU constructs. A compiler developer
uses Velociraptor by generating VRIR for key parts of an input program.
Velociraptor provides an optimizing compiler toolkit for generating CPU and GPU
code and also provides a smart runtime system to manage the GPU.
 
To demonstrate Velociraptor in action, we present two proof-of-concept case
studies: a GPU extension for a JIT implementation of MATLAB language, and a JIT
compiler for Python targeting CPUs and GPUs.
View the paper (.pdf)
BibTeX entry 
  
| Mc2For: A Tool for Automatically Translating MATLAB to FORTRAN 95 | back |  
Authors:  Xu Li and Laurie HendrenDate: 3-6 Feb. 2014
 WCRE '14, Antwerp, Belgium
 
 Abstract MATLAB is a dynamic numerical scripting language widely used by scientists, 
engineers and students. While MATLAB's high-level syntax and dynamic types 
make it ideal for prototyping, programmers often prefer using high-performance 
static languages such as FORTRAN for their final distributable code. Rather 
than rewriting the code by hand, our solution is to provide a tool that 
automatically translates the original MATLAB program to an equivalent FORTRAN 
program. There are several important challenges for automatically translating 
MATLAB to FORTRAN, such as correctly estimating the static type characteristics 
of all the variables in a MATLAB program, mapping MATLAB built-in functions, 
and effectively mapping MATLAB constructs to equivalent FORTRAN constructs.
 
In this paper, we introduce Mc2FOR, a tool which automatically translates 
MATLAB to FORTRAN. This tool consists of two major parts. The first part 
is an interprocedural analysis component to estimate the static type 
characteristics, such as the shape of arrays and the range of scalars, 
which are used to generate variable declarations and to remove 
unnecessary array bounds checking in the translated FORTRAN program. 
The second part is an extensible FORTRAN code generation framework 
automatically transforming MATLAB constructs to FORTRAN. This work 
has been implemented within the McLab framework, and we demonstrate the 
performance of the translated FORTRAN code on a collection of MATLAB 
benchmarks.
 
View the paper (.pdf)
BibTeX entry
 
  
| Optimizing MATLAB Feval with Dynamic Techniques | back |  
Authors:  Nurudeen Lameed and Laurie HendrenDate: October 2013
 DLS '13, Indianapolis, USA
 
 Abstract View the paper (.pdf)
BibTeX entryMATLAB is a popular dynamic array-based language used by engineers, scientists
and students worldwide. The built-in function feval is an important MATLAB
feature for certain classes of numerical programs and solvers which benefit
from having functions as parameters. Programmers may pass a function name or
function handle to the solver and then the solver uses feval to indirectly
call the function. In this paper, we show that there are significant
performance overheads for function calls via feval, in both MATLAB
interpreters and JITs. The paper then proposes, implements and compares two
on-the-fly mechanisms for specialization of feval calls. The first approach
uses on-stack replacement technology, as supported by McVM/McOSR. The second
approach specializes calls of functions with feval using a combination of
runtime input argument types and values. Experimental results on seven
numerical solvers show that the techniques provide good performance
improvements.
 
  
| Mixed Model Universal Software Thread-Level Speculation | back |  
Authors:  Zhen Cao and Clark VerbruggeDate: October 2013
 ICPP '13, Lyon, France
 
 Abstract Software approaches to Thread-Level Speculation (TLS) have been recently explored,
bypassing the need for specialized hardware designs.  These approaches, however, tend to
focus on source or VM-level implementations aimed at specific language and runtime
environments.  In addition, previous software approaches tend to make use of a simple
thread forking model, reducing their ability to extract substantial parallelism from
tree-form recursion programs such as depth-first search and divide-and-conquer. This
paper proposes a Mixed forking model Universal software-TLS (MUTLS) system to
overcome these limitations.  MUTLS is purely based on the LLVM intermediate
representation (IR), a language and architecture independent IR that supports more than
10 source languages and target architectures by many projects.  MUTLS maximizes parallel
coverage by applying a mixed forking model that allows all threads to speculate, forming
a tree of threads. We evaluate MUTLS using several C/C++ and Fortran benchmarks on a 64-core
machine. On 3 computation intensive applications we achieve speedups of 30 to 50 and
20 to 50 for the C and Fortran versions, respectively. We also observe speedups of 2 to 7
for memory intensive applications. Our experiments indicate
that a mixed model is preferable for parallelization of tree-form recursion applications
over the simple forking models used by previous software-TLS approaches. Our work also
demonstrates that actual speedup is achievable on existing, commodity multi-core
processors while maintaining the flexibility of a highly generic implementation context.
 
View the paper (.pdf)
BibTeX entry
 
  
| Adaptive Fork-Heuristics for Software Thread-Level Speculation | back |  
Authors:  Zhen Cao and Clark VerbruggeDate: September 2013
 PPAM '13, Warsaw, Poland
 
 Abstract Fork-heuristics play a key role in software Thread-Level Speculation (TLS). Current fork-heuristics either lack real parallel execution environment information to accurately evaluate fork points and/or focus on hardware-TLS implementation which cannot be directly applied to software TLS. This paper proposes adaptive fork-heuristics as well as a feedback-based selection technique to overcome the problems. Adaptive fork-heuristics insert and speculate on all potential fork/join points and purely rely on the runtime system to disable inappropriate ones. Feedback-based selection produces parallelized programs with ideal speedups using log files generated by adaptive heuristics. Experiments of three scientific computing benchmarks on a 64-core machine show that feedback-based selection and adaptive heuristics achieve more than 88% and 50% speedups of the manual-parallel version, respectively. For the Barnes-Hut benchmark, feedback-based selection is 49% faster than the manual-parallel version.
 
View the paper (.pdf)
BibTeX entry
 
 
Authors:  Soroush Radpour, Laurie Hendren and Max Schäfer Date: March 2013
 CC '13, Rome, Italy
 
 Abstract MATLAB is a very popular  dynamic "scripting" language for numerical
computations used by scientists, engineers and students world-wide.   MATLAB
programs are often developed incrementally using a mixture of MATLAB scripts
and functions, and frequently build upon existing code which may use outdated
features. This results in  programs that could benefit from refactoring,
especially if the code will be reused and/or distributed.   Despite the need
for refactoring, there appear to be no MATLAB refactoring tools available.
Furthermore, correct refactoring of MATLAB is quite challenging because of its
non-standard rules for binding identifiers.  Even simple refactorings are
non-trivial.
 
This paper presents the important challenges of refactoring MATLAB
along with automated techniques to handle a collection of refactorings
for MATLAB functions and scripts including: converting scripts to
functions, extracting functions, and converting dynamic function calls
to static ones.  The refactorings have been implemented using
the McLAB compiler framework, and an evaluation is given on a large
set of MATLAB benchmarks which demonstrates the effectiveness of our
approach.
 
View the paper (.pdf)
BibTeX entry
 
  
| A Modular Approach to On-Stack Replacement in LLVM | back |  
Authors:  Nurudeen Lameed and Laurie HendrenDate: March 2013
 VEE '13, Houston, Texas, USA
 
 Abstract On-stack replacement (OSR) is a technique that allows a virtual machine to 
interrupt running code during the execution of a function/method, to 
re-optimize the function on-the-fly using an optimizing JIT compiler, and 
then to resume the interrupted function at the point and state at which it 
was interrupted. OSR is particularly useful for programs with potentially 
long-running loops, as it allows dynamic optimization of those loops as 
soon as they become hot.
 
This paper presents a modular approach to implementing OSR for the LLVM 
compiler infrastructure. This is an important step forward because LLVM is 
gaining popular support, and adding the OSR capability allows compiler 
developers to develop new dynamic techniques. In particular, it will enable 
more sophisticated LLVM-based JIT compiler approaches.  Indeed, other 
compiler/VM developers can use our approach because it is a clean modular 
addition to the standard LLVM distribution. Further, our approach is defined 
completely at the LLVM-IR level and thus does not require any modifications 
to the target code generation.
 
The OSR implementation can be used by different compilers to support a 
variety of dynamic optimizations. As a demonstration of our OSR approach, 
we have used it to  support dynamic inlining in McVM.  McVM is a virtual 
machine for MATLAB which uses a LLVM-based JIT compiler. MATLAB is a popular 
dynamic language for scientific and engineering applications that typically 
manipulate large matrices and often contain long-running loops, and is thus 
an ideal target for dynamic JIT compilation and OSRs. Using our McVM example, 
we demonstrate reasonable overheads for our benchmark set, and performance 
improvements when using it to perform dynamic inlining. 
 
View the paper (.pdf)
BibTeX entry
 
 
Authors:  Anton Dubrau and Laurie HendrenDate: October 2012
 OOPSLA '11, Tucson, Arizona, USA
 
 Abstract MATLAB is a dynamic scientific language used by scientists, engineers 
and students worldwide.  Although MATLAB is very suitable for rapid 
prototyping and development,  MATLAB users often want to convert their 
final MATLAB programs to a static language such as FORTRAN.  This paper 
presents an extensible object-oriented toolkit for supporting the generation 
of static programs from dynamic MATLAB programs.  Our open source toolkit, 
called the MATLAB Tamer, identifies a large tame subset of MATLAB, supports 
the generation of a specialized Tame IR for that subset, provides a principled 
approach to handling the large number of builtin MATLAB functions, and supports 
an extensible interprocedural value analysis for estimating MATLAB types and call graphs.
 
View the paper (.pdf)
BibTeX entry
 
  
| Kind Analysis for MATLAB | back |  
Authors:  Jesse Doherty, Laurie Hendren and Soroush RadpourDate: October 2011
 OOPSLA '11, Portland, Oregon, USA
 
 Abstract 
 
MATLAB is a popular dynamic programming language used for scientific and
numerical programming. As a language, it has evolved from a small scripting
language intended as an interactive interface to numerical libraries, to a very
popular language supporting many language features and libraries. The 
overloaded syntax and dynamic nature of the language, plus the somewhat organic
addition of language features over the years, makes static analysis of modern
MATLAB quite challenging.
 
A fundamental problem in MATLAB is determining the kind of an 
identifier. Does an identifier refer to a variable, a named function or a 
prefix?  Although this is a trivial problem for most programming languages, 
it was not clear how to do this properly in MATLAB. Furthermore, there was 
no simple explanation of kind analysis suitable for MATLAB programmers,  
nor a publicly-available implementation suitable for compiler researchers.
 
This paper explains the required background of MATLAB, clarifies the kind
assignment program, and proposes some general guidelines for developing
good kind analyses.  Based on these foundations we present our design
and implementation of a variety of kind analyses, including an
approach that matches the intended behaviour of modern MATLAB 7 and two
potentially better alternatives.
 
We have implemented all the variations of the kind analysis in McLAB, our
extensible compiler framework, and we present an empirical evaluation of the
various analyses on a large set of benchmark programs.
 
View the paper (.pdf)
View the slides (.pptx)
BibTeX entry
 
  
| The Soot framework for Java program analysis: a retrospective | back |  
Authors: Patric Lam, Eric Bodden,  Ondrej Lhotak and Laurie HendrenDate: October 2011
 CETUS '11, Galveston, Texas, USA
 
 Abstract 
 
Soot is a successful framework for experimenting with compiler and
software engineering techniques for Java programs.  Researchers from
around the world have implemented a wide range of research tools which
build on Soot, and Soot has been widely used by students for both
courses and thesis research.  In this paper, we describe relevant
features of Soot, summarize its development process, and discuss
useful features for future program analysis frameworks.
 
View the paper (.pdf)
View the slides (.pdf)
BibTeX entry
 
  
| There is Nothing Wrong with Out-of-Thin-Air: Compiler Optimization and Memory Models | back |  
Authors:  Clark Verbrugge, Allan Kielstra, and Yi ZhangDate: June 2011
 MSPC 2011, San Jose, California, USA
 
 Abstract 
 
Memory models are used in concurrent systems to specify visibility
properties of shared data. A practical memory model, however, must
permit code optimization as well as provide a useful semantics for
programmers.  Here we extend recent observations that the current Java
memory model imposes significant restrictions on the ability to
optimize code.  Beyond the known and potentially correctable proof
concerns illustrated by others we show that major constraints on code
generation and optimization can in fact be derived from fundamental
properties and guarantees provided by the memory model.  To address
this and accommodate a better balance between programmability and
optimization we present ideas for a simple concurrency
semantics for Java that avoids basic problems at a cost of backward
compatibility.
 
View the paper (.pdf)
View the slides (.pdf)
BibTeX entry
 
  
| MetaLexer: A Modular Lexical Specification Language | back |  
Authors:  Andrew Casey and Laurie HendrenDate: March 2011
 AOSD '11, Pernambuco, Brazil
 
 Abstract 
 
Compiler toolkits make it possible to rapidly develop compilers and 
translators for new programming languages.  Although there exist elegant 
toolkits for modular and extensible parsers, compiler developers must 
often resort to ad-hoc solutions when extending or composing lexers. 
This paper presents MetaLexer, a new modular lexical specification 
language and associated tool.
 
MetaLexer allows programmers to define lexers in a modular fashion.  
MetaLexer modules can be used to break the lexical specification of 
a language into a collection smaller modular lexical specifications. 
Control is passed between the modules using the concept of meta-tokens 
and meta-lexing. MetaLexer modules are also extensible.
 
MetaLexer has three key features: it abstracts lexical state transitions 
out of semantic actions, it makes modules extensible by introducing multiple 
inheritance, and it provides platform agnostic support for a variety of 
programming languages and compiler front-end toolchains.
 
We have constructed a  MetaLexer tool which converts MetaLexer specifications
to the popular JFlex lexical specification language and we have used our tool
to create lexers for three real programming languages and their extensions:
AspectJ (and two AspectJ extensions), MATLAB, and MetaLexer itself. The new 
specifications are easier to read, are extensible, and require much less 
action code than the originals.
 
View the paper (.pdf)
View the slides (.pptx)
BibTeX entry
 
  
| Typing Aspects for MATLAB | back |  
Authors: Laurie HendrenDate: March 2011
 DSAL '11, Pernambuco, Brazil
 
 Abstract 
 
The MATLAB programming language is heavily used in many scientific 
and engineering domains. Part of the appeal of the language is that 
one can quickly prototype numerical algorithms without requiring 
any static type declarations. However, this lack of type information 
is detrimental to both the programmer in terms of software reliability 
and understanding,  and to the compiler in terms of generating efficient code.
 
This paper introduces the idea of adding typing aspects to MATLAB programs.  
A typing aspect can be used to: (1) capture the run-time types of variables, 
and (2) to check run-time types against either a declared type or against 
a previously captured run-time type. Typings aspects can be deployed at 
three different levels,they can be used: (1) solely as documentation, 
(2) to log type errors or (3) to catch type errors at run-time.
 
View the paper (.pdf)
View the slides (.pptx)
BibTeX entry
 
  
| Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT 
Compiler | back |  
Authors:  Nurudeen Lameed and Laurie HendrenDate: March 2011
 CC 2011, Saarbrüken, Germany
 
 Abstract 
 
MATLAB has gained widespread acceptance among scientists.Several dynamic 
aspects of the language contribute to its appeal, but also provide many 
challenges. One such problem is caused by the copy semantics of MATLAB. 
Existing MATLAB systems rely on reference-counting schemes to create 
copies only when a shared array representation is updated. This reduces 
array copies, but requires runtime checks.
 
We present a staged static analysis approach to determine when copies
are not required. The first stage uses two simple, intraprocedural analyses, 
while the second stage combines a forward necessary copy analysis with a 
backward copy placement analysis. Our approach eliminates unneeded array 
copies without requiring reference counting or frequent runtime checks.
 
We have implemented our approach in the McVM JIT. Our results demonstrate 
that, for our benchmark set, there are significant overheads for both 
existing reference-counted and naive copy-insertion approaches, and that 
our staged approach is effective in avoiding unnecessary copies.
 
View the paper (.pdf)
BibTeX entry
 
  
| McFLAT: A Profile-based Framework for MATLAB Loop Analysis and Transformations | back |  
Authors:  Amina Aslam and Laurie HendrenDate: October 2010
 LCPC 2010, Houston, Texas, USA
 
 Abstract 
 
Parallelization and optimization of the MATLAB programming language
presents several challenges due to the dynamic nature of MATLAB. Since
MATLAB does not have static type declarations, neither the shape and size of
arrays, nor the loop bounds are known at compile-time. This means that many
standard array dependence tests and associated transformations cannot be applied
straight-forwardly. On the other hand, many MATLAB programs operate on arrays
using loops and thus are ideal candidates for loop transformations and possibly
loop vectorization/parallelization.
 
This paper presents a new framework,McFLAT,which uses profile-based training
runs to determine likely loop-bounds ranges for which specialized versions of the
loops may be generated. The main idea is to collect information about observed
loop bounds and hot loops using training data which is then used to heuristically
decide upon which loops and which ranges are worth specializing using a variety
of loop transformations.
 
Our McFLAT framework has been implemented as part of the McLAB extensible
compiler toolkit. Currently, McFLAT, is used to automatically transform ordinary
MATLAB code into specialized MATLAB code with transformations applied
to it. This specialized code can be executed on any MATLAB system, and we report
results for four execution engines, Mathwork’s proprietary MATLAB system,
the GNU Octave open-source interpreter, McLAB’s McVM interpreter and the
McVM JIT. For several benchmarks, we observed significant speedups for the
specialized versions, and noted that loop transformations had different impacts
depending on the loop range and execution engine.
 
View the paper (.pdf)
BibTeX entry
 
  
| Optimizing Matlab through Just-In-Time Specialization | back |  
Authors:  Maxime Chevalier-Boisvert, Laurie Hendren, and Clark VerbruggeDate: March 2010
 CC 2010, Paphos, Cyprus
 
 Abstract 
 
Scientists are increasingly using dynamic programming languages
like Matlab for prototyping and implementation. Effectively
compiling Matlab raises many challenges due to the dynamic and complex
nature of Matlab types. This paper presents a new JIT-based approach
which specializes and optimizes functions on-the-fly based on the
current types of function arguments.
 
A key component of our approach is a new type inference algorithm which
uses the run-time argument types to infer further type and shape information,
which in turn provides new optimization opportunities. These
techniques are implemented in McVM, our open implementation of a
Matlab virtual machine. As this is the first paper reporting on McVM,
a brief introduction to McVM is also given.
 
We have experimented with our implementation and compared it to several
other Matlab implementations, including the Mathworks proprietary
system, McVM without specialization, the Octave open-source interpreter
and the McFor static compiler. The results are quite encouraging
and indicate that specialization is an effective optimization—McVM
with specialization outperforms Octave by a large margin and also sometimes
outperforms the Mathworks implementation.
 
View the paper (.pdf)
Download the paper (.ps.gz)
BibTeX entry
 
  
| AspectMatlab: An Aspect-Oriented Scientific
Programming Language | back |  
Authors:  Toheed Aslam, Jesse Doherty, Anton Dubrau and Laurie HendrenDate: March 2010
 AOSD 2010, Rennes and Saint-Malo , France
 
 Abstract 
 
This paper introduces a new aspect-oriented programming
language, AspectMatlab. Matlab is a dynamic scientific
programming language that is commonly used by scientists
because of its convenient and high-level syntax for arrays,
the fact that type declarations are not required, and the
availability of a rich set of application libraries.
 
AspectMatlab introduces key aspect-oriented features in
a way that is both accessible to scientists and where the
aspect-oriented features concentrate on array accesses and
loops, the core computation elements in scientific programs.
 
Introducing aspects into a dynamic language such asMatlab
also provides some new challenges. In particular, it
is difficult to statically determine precisely where patterns
match, resulting in many dynamic checks in the woven code.
Our compiler includes flow analyses which are used to eliminate
many of those dynamic checks.
 
This paper reports on the language design of AspectMatlab,
the amc compiler implementation and related optimizations,
and also provides an overview of use cases that are
specific to scientific programming.
 
View the paper (.pdf)
Download the paper (.ps.gz)
BibTeX entry
 
  
| Dependent Advice: A General Approach to Optimizing History-based
Aspects | back |  
Authors:  Eric Bodden, Feng Chen and Grigore RosuDate: March 2009
 AOSD 2009, Charlottesville, VA
 
 Abstract 
 
Many aspects for runtime monitoring are history-based: they contain pieces
of advice that execute conditionally, based on the observed execution history.
History-based aspects are notorious for causing high runtime overhead. Compilers
can apply powerful optimizations to history-based aspects using domain knowledge.
Unfortunately, current aspect languages like AspectJ impede optimizations, as
they provide no means to express this domain knowledge.
 
In this paper we present dependent advice, a novel AspectJ language
extension. A dependent advice contains dependency annotations that preserve
crucial domain knowledge: a dependent advice needs to execute only when its
dependencies are fulfilled. Optimizations can exploit this knowledge: we present
a whole-program analysis that removes advice-dispatch code from program locations
at which an advice's dependencies cannot be fulfilled.
 
Programmers often opt to have history-based aspects generated automatically,
from formal specifications from model-driven development or runtime monitoring.
As we show using code-generation tools for two runtime-monitoring approaches,
tracematches and JavaMOP, such tools can use knowledge contained in the
specification to automatically generate dependency annotations as well.
 
Our extensive evaluation using the DaCapo benchmark suite shows that the use of
dependent advice can significantly lower, sometimes even completely eliminate,
the runtime overhead caused by history-based aspects, independently of the
specification formalism.
 
View the paper (.pdf)
Download the paper (.ps.gz)
BibTeX entry
 
  
| Finding Programming Errors Earlier by Evaluating Runtime Monitors
Ahead-of-Time | back |  
Authors:  Eric Bodden, Patrick Lam and Laurie HendrenDate: November 2008
 FSE 2008
 
 Abstract
 
Runtime monitoring allows programmers to validate, for instance, 
the proper use of application interfaces. Given a property specification,
a runtime monitor tracks appropriate runtime events to detect violations and
possibly execute recovery code.   Although powerful, runtime monitoring
inspects only one program run at a time and so may require
many program runs to find errors.  Therefore, in this paper, we present
ahead-of-time techniques that can (1) prove the absence of property violations on all
program runs, or (2) flag locations where violations are likely to occur.
 
Our work focuses on tracematches, an expressive runtime monitoring
notation for reasoning about groups of correlated objects.
We describe a novel flow-sensitive static analysis for
analyzing monitor states. Our abstraction captures both positive
information (a set of objects could be in a particular monitor
state) and negative information (the set is known not to be in a
state).  The analysis resolves heap references by combining the
results of three points-to and alias analyses.  We also propose a
machine learning phase to filter out likely false
positives.
 
We applied a set of 13 tracematches to the DaCapo benchmark suite and
SciMark2.  Our static analysis rules out all potential points of failure
in 50% of the cases, and 75% of false positives on average. Our
machine learning algorithm correctly classifies the remaining potential
points of failure in all but three of 461 cases.  The approach revealed
defects and suspicious code in three benchmark programs.
 
View the paper (.pdf)
BibTeX entry
 
  
| Object representatives: a uniform abstraction for pointer information | back |  
Authors:  Eric Bodden, Patrick Lam and Laurie HendrenDate: October 2008
 1st International Academic Conference of the British Computer Society (BCS)
 
 Abstract Pointer analyses enable many subsequent program analyses and
transformations by statically disambiguating references to the
heap. However, different client analyses may have different sets
of pointer analysis needs, and each must pick some pointer analysis
along the cost/precision spectrum to meet those needs.  Some analysis
clients employ combinations of pointer analyses to obtain better
precision with reduced analysis times. Our goal is to ease the task of
developing client analyses by enabling composition and
substitutability for pointer analyses.  We therefore
propose object representatives, which statically represent runtime objects.
A representative encapsulates the notion of object identity, as observed
through the representative's aliasing relations with other representatives.
Object representatives enable pointer analysis clients to disambiguate
references to the heap in a uniform yet flexible way. 
Representatives can be generated from many
combinations of pointer analyses, and pointer analyses can be freely exchanged
and combined without changing client code. 
We believe that the use of object representatives brings many software
engineering benefits to compiler implementations because, at compile time, 
object representatives are Java objects. We discuss our motivating case for 
object representatives, namely, the development of an abstract interpreter for 
tracematches, a language feature for runtime monitoring. We explain one 
particular algorithm for computing object representatives which combines 
flow-sensitive intraprocedural must-alias and must-not-alias analyses with a 
flow-insensitive, context-sensitive whole-program points-to analysis.  In our 
experience, client analysis implementations can almost directly substitute 
object representatives for runtime objects, simplifying the design and 
implementation of such analyses.
 
View the paper (.pdf)
BibTeX entry
 
  
| Racer: Effective Race Detection Using AspectJ | back |  
Winner of an "ACM SIGSOFT Distinguished Paper Award".Authors: Eric Bodden and Klaus Havelund
 Date: July 2008
 ISSTA 08, July 2008, Seattle, WA
 
 Abstract 
 Programming errors occur frequently in large software systems, and even more so
if these systems are concurrent. In the past researchers have developed
specialized programs to aid programmers detecting concurrent programming errors
such as deadlocks, livelocks, starvation and data races.
In this work we propose a language extension to the aspect-oriented programming
language AspectJ, in the form of three new pointcuts, lock(),
unlock() and maybeShared(). These pointcuts allow programmers
to monitor program
events where locks are granted or handed back, and where values are accessed
that may be shared amongst multiple Java threads. We decide thread-locality
using a static thread-local objects analysis developed by others.
Using the three new primitive pointcuts, researchers can directly implement efficient
monitoring algorithms to detect concurrent programming errors online.
As an example, we expose a new algorithm which we call Racer, an
adoption of the well-known Eraser algorithm to the memory model of Java.
We implemented the new pointcuts as an extension to the AspectBench Compiler,
implemented the Racer algorithm using this language extension and then applied
the algorithm to the NASA K9 Rover Executive.
Our experiments proved our implementation very effective. In the Rover
Executive Racer finds 70 data races. Only one of these races was previously known.
We further applied the algorithm to two other multi-threaded programs written by
Computer Science researchers, in which we found races as well. 
View the paper (.pdf)
BibTeX entry
 
  
| Relational Aspects as Tracematches | back |  
Authors: Eric Bodden, Reehan Shaikh and Laurie HendrenDate: March 2008
 AOSD 2008, March 2008, Brussels, Belgium
 
 Abstract 
 
The relationships between objects in an object-oriented program are an
essential property of the program's design and implementation.  Two
previous approaches to implement relationships with aspects were
association aspects, an AspectJ-based language extension, and the
relationship aspects library.  While those approaches greatly ease
software development, we believe that they are not general enough. For
instance, the library approach only works for binary relationships, while
the language extension does not allow for the association of primitive
values or values from non-weavable classes.
Hence, in this work we propose a generalized alternative implementation
via a direct reduction to tracematches, a language feature for executing
an advice after having matched a sequence of events.
This new implementation scheme yields multiple benefits.  Firstly, our
implementation is more general than existing ones, avoiding most
previous limitations. It also yields a new language construct,
relational tracematches.
We provide an efficient implementation based on the AspectBench
Compiler, along with test cases and microbenchmarks.  Our empirical
studies showed that our implementation, when compared to previous
approaches, uses a similar memory footprint with no leaking, but the
generality of our approach does lead to some runtime overhead.  We
believe that our implementation can provide a solid foundation for
future research.
 
View the paper (.pdf)
BibTeX entry
 
  
| Compiler-guaranteed Safety in Code-copying Virtual Machines | back |  
Authors: Gregory B. Prokopski and Clark VerbruggeDate: March 2008
 CC 2008, March 29 - April 6, 2008, Budapest, Hungary
 
 Abstract Virtual Machine authors face a difficult choice between low performance, cheap interpreters, or specialized and costly compilers. A method able to bridge this wide gap is the existing \emph{code-copying} technique that reuses chunks of the VM's binary code to create a simple JIT. This technique is not reliable without a compiler guaranteeing that copied chunks are still functionally equivalent despite aggressive optimizations. We present a proof-of-concept, minimal-impact modification of a highly optimizing compiler, GCC. A VM programmer marks chunks of VM source code as {\em copyable}.  The chunks of native code resulting from compilation of the marked source become addressable and self-contained. Chunks can be safely copied at VM runtime, concatenated and executed together. This allows code-copying VMs to safely achieve speedup up to 3 times, 1.67 on average, over the {\em direct} interpretation. This maintainable enhancement makes the code-copying technique reliable and thus practically usable.
 
View the paper (.pdf)
BibTeX entry
View the slides (.pdf)
Springer version
 
  
| Phase-Based Adaptive Recompilation in a JVM | back |  
Authors: Dayong Gu and Clark VerbruggeDate: April 2008
 CGO 2008, April 6 - 9, 2008, Boston, Massachusetts
 
 Abstract Modern JIT compilers often employ multi-level recompilation strategies as a means of ensuring the most used code is also the most highly optimized, balancing optimization costs and expected future performance.  Accurate selection of code to compile and level of optimization to apply is thus important to performance.  In this paper we investigate the effect of an improved recompilation strategy for a Java virtual machine.  Our design makes use of a lightweight, low-level profiling mechanism to detect high-level, variable length phases in program execution.  Phases are then used to guide adaptive recompilation choices, improving performance.  We develop both an offline implementation based on trace data and a self-contained online version. Our offline study shows an average speedup of 8.7% and up to 21%, and our online system achieves an average speedup of 4.4%, up to 18%. We subject our results to extensive analysis and show that our design achieves good overall performance with high consistency despite the existence of many complex and interacting factors in such an environment.
 
View the paper (.pdf)
BibTeX entry
View the slides (.pdf)
ACM version
 
  
| A staged static program analysis to improve the performance of runtime monitoring | back |  
Authors: Eric Bodden Laurie Hendren and Ondřej LhotákDate: July 2007
 21st European Conference on Object-Oriented Programming, July 30th - August 3rd 2007, Berlin, Germany
 
 
There exists an extended Technical Report version of this paper: abc-2007-2.
 Abstract  
In runtime monitoring, a programmer specifies a piece of code to execute when
a trace of events occurs during program execution.
Our work is based on tracematches, an extension to AspectJ,
which allows programmers to specify
traces via regular expressions with free variables.
In this paper we present a
staged static analysis which speeds up trace matching by
reducing the required runtime instrumentation.
The first stage is a simple analysis that
rules out entire tracematches, just based on
the names of symbols.  In the second stage,
a points-to analysis is used, along with a flow-insensitive
analysis that eliminates instrumentation points with
inconsistent variable bindings.   In the third stage the
points-to analysis is combined with a flow-sensitive
analysis that also takes into consideration the order in
which the symbols may execute.
To examine the effectiveness of each stage, we experimented
with a set of nine tracematches applied to the DaCapo benchmark suite.
We found that about 25% of the tracematch/benchmark combinations
had instrumentation overheads greater than 10%.
In these cases the first two stages work well for certain
classes of tracematches, often leading to significant performance
improvements.   Somewhat surprisingly, we found the
third, flow-sensitive, stage did not add any improvements.
 
View the paper (.pdf)
BibTeX entry
 
  
| Component-Based Lock Allocation | back |  
Authors: Richard L. Halpert and Christopher J. F. Pickett and Clark VerbruggeDate: July 2007
 PACT 2007, September 2007, Brasov, Romania
 
 Abstract The allocation of lock objects to critical sections in concurrent
programs affects both performance and correctness.  Recent work
explores automatic lock allocation, aiming primarily to minimize
conflicts and maximize parallelism by allocating locks to individual
critical section interferences.  We investigate component-based lock
allocation, which allocates locks to entire groups of interfering
critical sections.  Our allocator depends on a thread-based side
effect analysis, and benefits from precise points-to and may happen in
parallel information.  Thread-local object information has a small
impact, and dynamic locks do not improve significantly on static
locks.  We experiment with a range of small and large Java benchmarks
on 2-way, 4-way, and 8-way machines, and find that a single static
lock is sufficient for mtrt, that performance degrades by 10% for
hsqldb, that jbb2000 becomes mostly serialized, and that for lusearch,
xalan, and jbb2005, component-based lock allocation recovers the
performance of the original program.
 
View the paper (.pdf)
BibTeX entry
 
  
| Dynamic Purity Analysis for Java Programs | back |  
Authors: Haiying Xu and Christopher J. F. Pickett and Clark VerbruggeDate: April 2007
 PASTE 2007, June 2007, San Diego, California, USA
 
 Abstract The pure methods in a program are those that exhibit functional
or side effect free behaviour, a useful property in many contexts. 
However, existing purity investigations present primarily static
results.   We perform a detailed examination of dynamic method purity
in Java programs using a JVM-based analysis.   We evaluate multiple
purity definitions that range from strong to weak, consider purity
forms specific to dynamic execution, and accomodate constraints
imposed by an example consumer application, memoization.   We show that
while dynamic method purity is actually fairly consistent between
programs, examining pure invocation counts and the percentage of the
bytecode instruction stream contained within some pure method reveals
great variation.   We also show that while weakening purity definitions
exposes considerable dynamic purity, consumer requirements can limit
the actual utility of this information.
 
View the paper (.pdf)
BibTeX entry
 
  
| Obfuscating Java: the most pain for the least gain | back |  
Authors:  Michael Batchelder and Laurie HendrenDate: March 2007
 International Conference on Compiler Construction (CC 2007), Braga, Portugal.
 
 Abstract Bytecode, Java's binary form, is relatively high-level and therefore susceptible to decompilation attacks. An obfuscator transforms code such that it becomes more complex and therefore harder to reverse engineer. We develop bytecode obfuscations that are complex to reverse engineer but also do not significantly degrade performance. We present three kinds of techniques that: (1) obscure intent at the operational level; (2) complicate control flow and object-oriented design (i.e. program structure); and (3) exploit the semantic gap between what is legal in source code and what is legal in bytecode. Obfuscations are applied to a benchmark suite to examine their affect on runtime performance, control flow graph complexity, and decompilation. These results show that most of the obfuscations have only minor negative performance impacts and many increase complexity. In almost all cases, tested decompilers fail to produce legal source code or crash completely. Those obfuscations that are decompilable greatly reduce the readability of the output source code.
 
View the paper (.pdf)
Download the paper (.ps.gz)
BibTeX entry
 
  
| Avoiding Infinite Recursion with Stratified Aspects | back |  
Authors:  Eric Bodden, Florian Forster and Friedrich SteimannDate: March 2006
 Net.ObjectDays 2006 - published in: GI-Edition Lecture Notes in Informatics 'NODe 2006 GSEM 2006'
 
 Abstract Infinite recursion is a known problem of aspect-oriented programming with AspectJ: if no special precautions are taken, aspects which advise other aspects can easily and unintentionally advise themselves. We present a compiler for an extension of the AspectJ programming language that avoids self reference by associating aspects with levels, and by automatically restricting the scope of pointcuts used by an aspect to join points of lower levels. We report on a case study using our language extension and quantify the changes necessary for migrating existing applications to it. Our results suggest that we can make programming with AspectJ simpler and safer, without restricting its expressive power unduly.
 
View the paper (.pdf)
BibTeX entry
 
  
| Programmer-Friendly Decompiled Java | back |  
Authors:  Nomair A. Naeem and Laurie HendrenDate: March 2006
 International Conference on Program Comprehension (ICPC 2006), Athens, Greece.
 
 Abstract Java decompilers convert Java class files to Java source. Java class files may be created by a 
number of different tools including standard Java compilers, compilers for other languages 
such as AspectJ, or other tools such as optimizers or obfuscators. There are two kinds of Java 
decompilers, javac-specific decompilers that assume that the class file was created by a 
standard javac compiler and tool-independent decompilers that can decompile arbitrary class 
files, independent of the tool that created the class files. Typically javac-specific 
decompilers produce more readable code, but they fail to decompile many class files produced 
by other tools.
 
This paper tackles the problem of how to make a toolindependent decompiler, Dava, produce Java 
source code that is programmer-friendly. In past work it has been shown that Dava can 
decompile arbitrary class files, but often the output, although correct, is very different 
from what a programmer would write and is hard to understand. Furthermore, tools like 
obfuscators intentionally confuse the class files and this also leads to confusing decompiled 
source files.
 
Given that Dava already produces correct Java abstract syntax trees (ASTs) for arbitrary class 
files, we provide a new back-end for Dava. The back-end rewrites the ASTs to semantically 
equivalent ASTs that correspond to code that is easier for programmers to understand. Our new 
backend includes a new AST traversal framework, a set of simple pattern-based transformations, 
a structure-based data flow analysis framework and a collection of more advanced AST 
transformations that use flow analysis information. We include several illustrative examples 
including the use of advanced transformations to clean up obfuscated code.
 
View the paper (.pdf)
BibTeX entry
 
  
| Context-sensitive points-to analysis: is it worth it? | back |  
Authors:  Ondřej Lhoták and Laurie HendrenDate: March 2006
 15th International Conference on Compiler Construction (CC 2006)
 
 Abstract We present the results of an empirical study evaluating the precision 
of subsetbased pointsto analysis with several variations of context sensitivity on 
Java benchmarks of significant size. We compare the use of call site strings as the 
context abstraction, object sensitivity, and the BDDbased contextsensitive algo 
rithm proposed by Zhu and Calman, and by Whaley and Lam. Our study includes 
analyses that contextsensitively specialize only pointer variables, as well as ones 
that also specialize the heap abstraction. We measure both characteristics of the 
pointsto sets themselves, as well as effects on the precision of client analyses. To 
guide development of efficient analysis implementations, we measure the number 
of contexts, the number of distinct contexts, and the number of distinct pointsto 
sets that arise with each context sensitivity variation. To evaluate precision, we 
measure the size of the call graph in terms of methods and edges, the number of 
devirtualizable call sites, and the number of casts statically provable to be safe. 
The results of our study indicate that objectsensitive analysis implementations are 
likely to scale better and more predictably than the other approaches; that object 
sensitive analyses are more precise than comparable variations of the other ap 
proaches; that specializing the heap abstraction improves precision more than ex 
tending the length of context strings; and that the profusion of cycles in Java call 
graphs severely reduces precision of analyses that forsake context sensitivity in 
cyclic regions.
 
View the paper (.pdf)
Download the paper (.ps.gz)
BibTeX entry
 
  
| Dynamic Data Structure Analysis for Java Programs | back |  
Authors: Sokhom Pheng and Clark VerbruggeDate: June 2006
 ICPC 2006, Athens, Greece
 
 Abstract Analysis of dynamic data structure usage is useful for both program
understanding and for improving the accuracy of other program analyses.
Static analysis techniques, however, suffer from reduced accuracy in
complex situations, and do not necessarily give a clear picture of
runtime heap activity. We have designed and implemented a dynamic heap
analysis system that allows one to examine and analyze how Java programs
build and modify data structures. Using a complete execution trace from
a profiled run of the program, we build a internal representation that
mirrors the evolving runtime data structures. The resulting series of
representations can then be analyzed and visualized, and we show how to
use our approach to help understand how programs use data structures,
the precise effect of garbage collection, and to establish limits on
static data structure analysis. A deep understanding of dynamic data
structures is particularly important for modern, object-oriented
languages that make extensive use of heapbased data structures.
 
View the paper (.pdf)
BibTeX entry
  
| Relative Factors in Performance Analysis of Java Virtual Machines | back |  
Authors:  Dayong Gu and Clark Verbrugge and Etienne M. GagnonDate: June 2006
 VEE 2006, Ottawa, Canada
 
 Abstract Many new Java runtime optimizations report relatively small,
single-digit performance improvements. On modern virtual and actual
hardware, however, the performance impact of an optimization can be
influenced by a variety of factors in the underlying systems. Using a
case study of a new garbage collection optimization in two different
Java virtual machines, we show the relative effects of issues that must
be taken into consideration when claiming an improvement. We examine the
specific and overall performance changes due to our optimization and
show how unintended side-effects can contribute to, and distort the
final assessment. Our experience shows that VM and hardware concerns can
generate variances of up to 9.5% in whole program execution time.
Consideration of these confounding effects is critical to a good,
objective understanding of Java performance and optimization.
 
View the paper (.pdf)
View the presentation slides (.pdf)
BibTeX entry
  
| Software Thread Level Speculation for the Java Language and Virtual
Machine Environment | back |  
Authors:  Christopher J.F. Pickett and Clark VerbruggeDate: October 2005
 LCPC 2005, October 2005, Hawthorne, NY, USA
 
 Abstract Thread level speculation (TLS) has shown great promise as a strategy
for fine to medium grain automatic parallelisation, and in a hardware
context techniques to ensure correct TLS behaviour are now well
established.  Software and virtual machine TLS designs, however,
require adherence to high level language semantics, and this can
impose many additional constraints on TLS behaviour, as well as open
up new opportunities to exploit language-specific information. 
We present a detailed design for a Java-specific, software TLS system
that operates at the bytecode level, and fully addresses the problems
and requirements imposed by the Java language and VM
environment.  Using SableSpMT, our research TLS framework, we
provide experimental data on the corresponding costs and benefits; we
find that exceptions, GC, and dynamic class loading have only a small
impact, but that concurrency, native methods, and memory model
concerns do play an important role, as does an appropriate,
language-specific runtime TLS support system.  Full consideration
of language and execution semantics is critical to correct and
efficient execution of high level TLS designs, and our work here
provides a baseline for future Java or Java virtual machine
implementations.
 
View
the paper (.pdf) 
View
the presentation slides (.pdf) 
BibTeX entry
   
| SableSpMT: A Software Framework for Analysing Speculative 
Multithreading in Java | back |  
Authors:  Christopher J.F. Pickett and Clark VerbruggeDate: August 2005
 PASTE 2005, September 2005, Lisbon, Portugal
 
 Abstract Speculative multithreading (SpMT) is a promising optimisation
technique for achieving faster execution of sequential programs on
multiprocessor hardware.  Analysis of and data acquisition from such
systems is however difficult and complex, and is typically limited to
a specific hardware design and simulation environment.  We have
implemented a flexible, software-based speculative multithreading
architecture within the context of a full-featured Java virtual
machine.  We consider the entire Java language and provide a complete
set of support features for speculative execution, including return
value prediction.  Using our system we are able to generate extensive
dynamic analysis information, analyse the effects of runtime feedback,
and determine the impact of incorporating static, offline information. 
Our approach allows for accurate analysis of Java SpMT on existing,
commodity multiprocessor hardware, and provides a vehicle for further
experimentation with speculative approaches and optimisations.
 
View
the paper (.pdf) 
View
the presentation slides (.pdf) 
BibTeX entry
   
| (P)NFG: A Language and Runtime System for Structured Computer Narratives | back |  
Authors:  Christopher J.F. Pickett and Clark Verbrugge and Félix MartineauDate: August 2005
 GameOn'NA 2005, August 2005, Montréal, Québec, Canada
 
 Abstract Complex computer game narratives can suffer from logical consistency
and playability problems if not carefully constructed, and current,
state of the art design tools do little to help analysis or ensure
good narrative properties.  A formally-grounded system that
allows for relatively easy design and analysis is therefore
desireable.  We present a language and an environment for
expressing game narratives based on a structured form of Petri Net,
the Narrative Flow Graph.  Our "(P)NFG" system provides
a simple, high level view of narrative programming that maps onto a
low level representation suitable for expressing and analysing game
properties.  The (P)NFG framework is demonstrated experimentally
by modelling narratives based on non-trivial interactive fiction
games, and integrates with the NuSMV model checker.  Our system
provides a necessary component for systematic analysis of computer
game narratives, and lays the foundation for all-around improvements
to game quality.
 
View
the paper (.pdf) 
BibTeX entry
   
| A Study of Type Analysis for Speculative Method Inlining in a JIT Environment | back |  
Authors:  Feng Qian and Laurie HendrenDate: April 2005
 CC 2005
 
 Abstract Method inlining is one of most important optimizations to achieve a
high performance JIT compiler in Java virtual machines. A type
analysis allows the compiler directly inline monomorphic calls. At
runtime, the compiler and type analysis have to handle dynamic class
loading properly because the analysis result is only correct at
compile time. Loading of new classes could invalidate previous
analysis results and optimizations.  Class hierarchy analysis (CHA)
has been used successfully in JIT compilers for speculative inlining
with various invalidation techniques as backup.
 
In this paper, we present the results of a limit study of method
inlining using dynamic type analysis on a set of standard Java
benchmarks. We developed a general type analysis framework for measure
the effectiveness of several well-known type analysis, including CHA,
RTA, XTA and VTA.  Surprisingly, the simple dynamic CHA is nearly as
good as an ideal type analysis for inlining virtual method calls. It
leaves no room for other type analysis to improve. On the other hand,
only reachability-based interprocedural type analysis (VTA) is able to
capture the majority of monomorphic interface calls.  We measured the
runtime overhead of interprocedural type analysis in the JIT
environment. To overcome the memory overhead of dynamic whole-program
analysis, we outlined the design of a demand-driven inter-procedural
type analysis for inlining hot interface calls.
 
View the paper (.ps)
  
| Using inter-procedural side-effect information in JIT optimizations | back |  
Authors:  Anatole Le, Ondřej Lhoták and Laurie HendrenDate: April 2005
 CC 2005
 
 Abstract Inter-procedural analyses such as side-effect analysis can provide
information useful for performing aggressive optimizations. We present
a study of whether side-effect information improves performance in
just-in-time (JIT) compilers, and if so, what level of analysis
precision is needed.
 
We used Spark, the inter-procedural analysis component of the Soot Java
analysis and optimization framework, to compute side-effect information
and encode it in class files. We modified Jikes RVM, a research JIT,
to make use of side-effect analysis in local common sub-expression
elimination, heap SSA, redundant load elimination and loop-invariant 
code motion. On the SpecJVM98 benchmarks, we measured the static number 
of memory operations removed, the dynamic counts of memory reads eliminated, 
and the execution time.
 
Our results show that the use of side-effect analysis increases the
number of static opportunities for load elimination by up to 98%,
and reduces dynamic field read instructions by up to 27%. Side-effect
information enabled speedups in the range of 1.08x to 1.20x for some
benchmarks. Finally, among the different levels of precision of 
side-effect information, a simple side-effect analysis is usually
sufficient to obtain most of these speedups.
 
View the paper (.ps)
BibTeX entry
  
| abc: An extensible AspectJ compiler | back |  
Authors:
Pavel Avgustinov, 
Aske Simon Christensen,
Laurie Hendren, 
Sascha Kuzins,
Jennifer Lhoták,
Ondřej Lhoták,
Oege de Moor, 
Damien Sereni,
Ganesh Sittampalam, and
Julian TibbleDate: March 2005
 AOSD 2005
 
 Abstract Research in the design of aspect-oriented programming languages
requires a workbench that facilitates easy experimentation with
new language features and implementation techniques. In particular,
new features for AspectJ have been proposed that require extensions
in many dimensions: syntax, type checking and code generation, as well as
data flow and control flow analyses.
 
The AspectBench Compiler (abc) is an implementation of such a workbench.
The base version of abc implements the full AspectJ language.
Its frontend is built, using the Polyglot framework, as a modular 
extension of the Java language. The use of Polyglot
gives flexibility of syntax and type checking.
The backend is built using the Soot framework, to give modular code
generation and analyses.
 
In this paper, we outline the design of abc,  focusing mostly on how
the design supports extensibility.   We then provide a general overview of how
to use abc to implement an extension.  Finally,
we illustrate the extension mechanisms of abc through a number of
small, but non-trivial, examples. abc is freely available under
the GNU LGPL.
 
View the paper (.ps)
BibTeX entry
  
| Code Layout as a Source of Noise in JVM Performance | back |  
Authors:  Dayong Gu and Clark Verbrugge and Etienne GagnonDate: October 2004
 CAMP04, October 2004, Vancouver, BC, Canada
 
 Abstract We describe the effect of a particular form of
"noise" in benchmarking.  We investigate the source of anomalous
measurement data in a series of optimization strategies that attempt
to improve runtime performance in the garbage collector of a Java
virtual machine.  The results of our experiments can be explained in
terms of the difference in code layout, and hence instruction and data
cache behaviour.  We show that unintended changes in code layout due to
code modifications as trivial as symbol renaming can contribute up to
2.7% of measured machine cycle cost, 20% in data cache misses, and 37%
in instruction cache misses.
 
View
the paper (.pdf) 
View
the presentation slides (.ppt) 
BibTeX entry
   
| Return Value Prediction in a Java Virtual Machine | back |  
Authors:  Christopher J.F. Pickett and Clark VerbruggeDate: September 2004
 VPW2, October 2004, Boston, MA, USA
 
 Abstract We present the design and implementation of return value prediction in
SableVM, a Java Virtual Machine.   
We give detailed results for the full
SPEC JVM98 benchmark suite, and compare our results with previous,
more limited data. 
At the performance limit of existing last value, stride, 2-delta
stride, parameter stride, and context (FCM) sub-predictors in a
hybrid, we achieve an average accuracy of 72%. 
We describe and characterize a new table-based memoization predictor
that complements these predictors nicely, yielding an
increased average hybrid accuracy of 
81%. 
VM level information about data widths provides a 35%
reduction in space, and
dynamic allocation and expansion of per-callsite hashtables allows for
highly accurate prediction with an average per-benchmark requirement
of 119 MB for the context predictor and 
43 MB for the memoization
predictor. 
As far as we know, the is the first implementation of non-trace-based
return value prediction within a JVM.
 
View
the paper (.pdf) 
View
the presentation slides (.pdf) 
BibTeX entry
   
| A Practical MHP Information Analysis for Concurrent Java Programs | back |  
Authors: Lin Li and Clark VerbruggeDate: September 2004
 LCPC 2004, September 2004, West Lafayette, IN, USA
 
 Abstract In this paper we present an implementation of May Happen in Parallel
analysis for Java that attempts to address some of the practical
implementation concerns of the original work.  We describe a design
that incorporates techniques for aiding a feasible implementation and
expanding the range of acceptable inputs.  We provide experimental
results showing the utility and impact of our approach and
optimizations using a variety of concurrent benchmarks.
 
View
the paper (.pdf) 
BibTeX entry
   
| Jedd:  A BDD-based Relational Extension of Java | back |  
Authors:  Ondřej Lhoták and Laurie HendrenDate: April 2004
 PLDI 2004, June 2004, Washington, D.C., USA
 
 Abstract In this paper we present Jedd, a language extension to Java that supports
a convenient way of programming with Binary Decision Diagrams (BDDs).
The Jedd language abstracts BDDs as database-style relations and operations
on relations, and provides static type rules to ensure that relational
operations are used correctly.
 
The paper provides a description of the Jedd language and reports on the
design and implementation of the Jedd translator and associated runtime
system.  Of particular interest is the approach to assigning attributes
from the high-level relations to physical domains in the underlying BDDs, which
is done by expressing the constraints as a SAT problem and using a modern
SAT solver to compute the solution.   Further, a runtime system is
defined that handles memory management issues and supports a browsable
profiling tool for tuning the key BDD operations. 
 
The motivation for designing Jedd was to support the development of whole
program analyses based on BDDs, and we have used Jedd to express five
key interrelated whole program analyses in our Soot compiler framework.
We provide some examples of this application and discuss our experiences
using Jedd.
 
View the
paper (.pdf) 
Download the
paper (.ps.gz) 
BibTeX entry
  
| Towards Dynamic Interprocedural Analysis in JVMs | back |  
Authors:  Feng Qian and Laurie HendrenDate: May 2004
 VM 2004, May 2004, San Jose, USA
 
 Abstract This paper presents a new, inexpensive, mechanism for constructing a
complete call graph for Java programs at runtime, and provides an
example of using the mechanism for implementing a dynamic
reachability-based interprocedural analysis (IPA), namely dynamic XTA.
 
                                                                                
Reachability-based IPAs, such as points-to analysis and escape
analysis, require a context-insensitive call graph of the analyzed
program.  Computing a call graph at runtime presents several
challenges.  First, the overhead must be low.  Second, when
implementing the mechanism for languages such as Java, both
polymorphism and lazy class loading must be dealt with correctly and
efficiently.  We propose a new, low-cost, mechanism for constructing
runtime call graphs in a JIT environment. The mechanism uses a
profiling code stub to capture the first execution of a call edge, and
adds at most one more instruction to repeated call edge invocations.
Polymorphism and lazy class loading are handled transparently.  The
call graph is constructed incrementally, and it supports optimistic
analysis and speculative optimizations with invalidations. 
We also developed a dynamic, reachability-based type analysis, dynamic
XTA, as an application of runtime call graphs. It also serves as an
example of handling lazy class loading in dynamic IPAs. 
                                                                                
The dynamic call graph construction algorithm and dynamic version of
XTA have been implemented in Jikes RVM. We present empirical
measurements of the overhead of call graph profiling and compare the
characteristics of call graphs built using our profiling code stubs
with conservative ones constructed by using dynamic class hierarchy
analysis (CHA).
 
View the
paper (.pdf) 
Download the
paper (.ps.gz) 
Slides 
  
| Integrating the Soot compiler infrastructure into an IDE | back |  
Authors:  Jennifer Lhoták, Ondřej Lhoták, and Laurie HendrenDate: April 2004
 CC 2004, April 2004, Barcelona, Spain
 
 Abstract This paper presents the integration of Soot, a byte-code analysis and 
transformation framework, with an integrated development environment (IDE), 
Eclipse.  Such an integrated toolkit is useful for both the compiler 
developer, to aid in understanding and debugging new analyses,  
and also for the end-user of the IDE, to aid in program
understanding by exposing semantic information gathered by the advanced
compiler analyses.  The paper discusses these advantages and provides 
concrete examples of its usefulness.
 
There are several major challenges to overcome in developing the integrated
toolkit,  and the paper discusses three major challenges and the solutions
to those challenges.   An overview of Soot and the integrated toolkit is
given, followed by a more detailed discussion of the fundamental components.
The paper concludes with several illustrative examples of using the
integrated toolkit along with a discussion of future plans and research.
 
View the
paper (.pdf) 
Download the
paper (.ps.gz) 
BibTeX entry
  
| Visualizing Program Analysis with the Soot-Eclipse Plugin | back |  
Authors:  Jennifer Lhoták and Ondřej LhotákDate: April 2004
 eTX (at ETAPS) 2004, March 2004, Barcelona, Spain
 
 Abstract Our integration of the Soot bytecode manipulation framework into the    
Eclipse IDE forms a powerful tool for graphically visualizing both      
the progress and output of program analyses. We demonstrate several     
examples of the visualizations that we have developed, and explain how  
they are useful for both compiler research and teaching.
 
View the
paper (.pdf) 
BibTeX entry
  
| Dynamic Metrics for Java | back |  
Authors:  Bruno Dufour, Karel Driesen, Laurie Hendren and Clark VerbruggeDate: November 2003
 OOPSLA 2003
 
 Abstract In order to perform meaningful experiments in optimizing compilation
and run-time system design, researchers usually rely on a suite of
benchmark programs of interest to the optimization
technique under consideration. Programs are described
as numeric, memory-intensive,  concurrent,
or object-oriented, based on a qualitative appraisal,
in some cases with little justification. We believe it is beneficial
to quantify the behaviour of programs with a concise and precisely
defined set of metrics, in order to make these intuitive notions of program
behaviour more concrete and subject to experimental validation.
We therefore define and measure a set of unambiguous, dynamic, robust
and architecture-independent metrics that can be used to categorize
programs according to their dynamic behaviour in five areas:
size, data structure, memory use, concurrency, and polymorphism.
A framework computing some of these metrics for Java programs is
presented along with specific results demonstrating how to use metric
data to understand a program's behaviour, and both guide and evaluate
compiler optimizations.
 
View the
paper (.pdf) 
View the presentation slides
 
BibTeX entry
  
| EVolve, an Open Extensible Software Visualization Framework | back |  
Authors:  Qin Wang, Wei Wang, Rhodes Brown, Karel Driesen, Bruno Dufour, Laurie Hendren and Clark VerbruggeDate: June 2003
 ACM Symposium on Software Visualization 2003
 
 Abstract Existing visualization tools typically do not allow easy extension by new 
visualization techniques, and are often coupled with inflexible data input 
mechanisms. This paper presents EVolve, a flexible and extensible framework
for visualizing program characteristics and behaviour. The framework is 
flexible in the sense that it can visualize many kinds of data, and it is
extensible in the sense that it is quite straightforward to add new kinds of 
visualizations.
 
The overall architecture of the framwork consists of the core EVolve platform
that communicates with data sources via a well defined data protocal
and which communicates with visualization methods via a visualization protocol.
 
Given a data source, an end-user can use EVolve as a stand-alone tool by interactively 
creating, configuring and modifying visualizations. A variety of visualizations are 
provided in the current EVolve library, with features that facilitate the 
comparison of multiple views on the same execution data. We demonstrate
EVolve in the context of visualizing execution behaviour of Java programs.
 
View the paper (.pdf)
  
| Points-to Analysis using BDDs | back |  
Authors:  Marc Berndl, Ondřej Lhoták, Feng Qian, Laurie Hendren and Navindra UmaneeDate: April 2003
 PLDI 2003, June 2003, San Diego, USA
 
 Abstract This paper reports on a new approach to solving a subset-based
points-to analysis for Java using Binary Decision Diagrams (BDDs). 
In the model checking community, BDDs have been shown very effective for 
representing large sets and solving very large verification problems. 
Our work shows that BDDs can also be very effective for developing a
points-to analysis that is simple to implement and that 
scales well, in both space and time, to large programs.
 
The paper first introduces BDDs and operations on BDDs using some 
simple points-to examples.  Then, a complete subset-based points-to
algorithm is presented, expressed completely using BDDs and BDD 
operations.   This algorithm is then refined by finding appropriate 
variable orderings and by making the algorithm propagate sets incrementally, in order to 
arrive at a very efficient algorithm.
Experimental results are given to justify the
choice of variable ordering, to demonstrate the improvement due to 
incrementalization, and to compare the performance of the BDD-based 
solver to an efficient hand-coded graph-based solver.     Finally,
based on the results of the BDD-based solver, a variety of BDD-based queries
are presented, including the points-to query. 
 
View the paper (.pdf)
Download the paper (.ps.gz)
Presentation slides (.pdf)
Presentation slides (.ps)
BibTeX entry
  
| Dynamic Profiling and Trace Cache Generation | back |  
Authors:  Marc Berndl and Laurie HendrenDate: March 2003
 CGO'03, March 2003, San Francisco, USA
 
 Abstract Dynamic program optimization is increasingly important for achieving
good runtime performance. A key issue is how to select which code to
optimize.  One approach is to dynamically detect traces, long
sequences of instructions spanning multiple methods, which are likely
to execute to completion. Traces are easy to optimize and have been
shown to be a good unit for optimization.
 
This paper reports on a new approach for dynamically detecting,
creating and storing traces in a Java virtual machine. We first
describe four important criteria for a successful trace strategy: good
instruction stream coverage, low dispatch rate, cache stability, and
optimizability of traces.  We then present our approach based on
branch correlation graphs.  A branch correlation graph stores
information about the correlation between pairs of branches, as weel
as additional state information.
 
We present the complete design for an efficient implementation of the
system, including a detailed discussion of the trace cache and
profiling mechanisms.  We have implemented an experimental framework
to measure the traces generated by our approach in a direct-threaded
Java VM(SableVM) and we presnet experimental results to show that the
trace we generate meet the design criteria.
 
View the technical report (pdf)
  
| Design, Implementation and Evaluation of Adaptive Recompilation with On-Stack Replacement | back |  
Authors:  Stephen J. Fink (IBM T.J. Watson)  and Feng QianDate: March 2003
 CGO'03, March 23-26, San Francisco, USA
 
 Abstract Modern virtual machines often maintain multiple compiled versions of a
method. An on-stack replacement (OSR) mechanism enables a virtual
machine to transfer execution between compiled versions, even while a
method runs. Relying on this mechanism, the system can exploit
powerful techniques to reduce compile time and code space, dynamically
de-optimize code, and invalidate speculative optimizations.
 
This paper presents a new, simple, mostly compiler-independent
mechanism to transfer execution into compiled code.  Additionally, we
present enhancements to an analytic model for recompilation to exploit
OSR for more aggressive optimization.  We have implemented these
techniques in Jikes RVM and present a comprehensive evaluation,
including a study of fully automatic, online, profile-driven deferred
compilation.
 
Paper available upon requests.
  
| CC2003: Effective Inline-Threaded Interpretation of Java Bytecode Using Preparation Sequences | back |  
Authors:  Etienne Gagnon and Laurie HendrenDate: January 2003
 CC 2003, April 2003, Warsaw, Poland
 
 Abstract Inline-threaded interpretation is a recent technique that improves
performance by eliminating dispatch overhead within basic blocks for
interpreters written in C.  The dynamic class loading,
lazy class initialization, and multi-threading features of Java reduce
the effectiveness of a straight-forward implementation of this
technique within Java interpreters.  In this paper, we introduce
preparation sequences, a new technique that solves the particular
challenge of effectively inline-threading Java.  We have implemented
our technique in the SableVM Java virtual machine, and our
experimental results show that using our technique, inline-threaded
interpretation of Java, on a set of benchmarks, achieves a speedup
ranging from 1.20 to 2.41 over switch-based interpretation, and a
speedup ranging from 1.15 to 2.14 over direct-threaded interpretation.
 
Download the paper (.ps.gz)
View the paper (.pdf)
  
| CC2003: Scaling Java Points-To Analysis using Spark | back |  
Authors:  Ondřej Lhoták and Laurie HendrenDate: January 2003
 CC 2003, April 2003, Warsaw, Poland
 
 Abstract Most points-to analysis research has been done on different systems by
different groups, making it difficult to compare results, and to understand
interactions between individual factors each group studied.
Furthermore, points-to analysis for Java has been studied much less
thoroughly than for C, and the tradeoffs appear very different.
We introduce Spark, a flexible framework for experimenting with
points-to analyses for Java. Spark supports equality- and subset-based
analyses, variations in field sensitivity, respect for declared types,
variations in call graph construction, off-line simplification, and
several solving algorithms. Spark is composed of building blocks on
which new analyses can be based.
We demonstrate Spark in a substantial study of factors affecting
precision and efficiency of subset-based points-to analyses, including
interactions between these factors. Our results show that Spark is
not only flexible and modular, but also offers superior time/space
performance when compared to other points-to analysis implementations.
 
  
| PASTE02-2: STEP: A Framework for the Efficient Encoding of General Trace Data | back |  
Authors: Rhodes Brown, Karel Driesen, David Eng, Laurie Hendren, John Jorgensen, Clark Verbrugge and Qin WangDate: November 2002
 PASTE 2002, Charleston, SC, USA
 
 Abstract Traditional tracing systems are often limited to recording a fixed set
of basic program events. This limitation can frustrate an application
or compiler developer who is trying to understand and characterize the
complex behavior of software systems such as a Java program running on
a Java Virtual Machine. In the past, many developers have resorted to
specialized tracing systems that target a particular type of program
event. This approach often results in an obscure and poorly documented
encoding format which can limit the reuse and sharing of potentially
valuable information. To address this problem, we present STEP, a
system designed to provide profiler developers with a standard method
for encoding general program trace data in a flexible and compact
format. The system consists of a trace data definition language along
with a compiler and an architecture that simplifies the client
interface by encapsulating the details of encoding and interpretation.
 
  
| PASTE02-1: Combining Static and Dynamic Data in Code Visualization | back |  
Authors: David EngDate: November 2002
 PASTE 2002, Charleston, SC, USA
 
 Abstract The task of developing, tuning, and debugging compiler optimizations is a
difficult one which can be facilitated by software visualization. There
are many characteristics of the code which must be considered when
studying the kinds of optimizations which can be performed. Both static
data collected at compile-time and dynamic runtime data can reveal
opportunities for optimization and affect code transformations. In order
to expose the behavior of such complex systems, visualizations should
include as much information as possible and accommodate the different
sources from which this information is acquired.
 
This paper presents a visualization framework designed to address these
issues. The framework is based on a new, extensible language called JIL
which provides a common format for encapsulating intermediate
representations and associating them with compile-time and runtime data.
We present new contributions which extend existing compiler and profiling
frameworks, allowing them to export the intermediate languages, analysis
results, and code metadata they collect as JIL documents. Visualization
interfaces can then combine the JIL data from separate tools, exposing
both static and dynamic characteristics of the underlying code. We
present such an interface in the form of a new web-based visualizer,
allowing JIL documents to be visualized online in a portable,
customizable interface.
 
  
| JGI02: Run-time Evaluation of Opportunities for Object Inlining in Java | back |  
Authors:  Ondřej Lhoták and Laurie HendrenDate: September, 2002
 JGI'02, November 2002, Seattle, WA, USA
 
 Abstract Object-oriented languages, such as Java, encourage the use of many small
objects linked together by field references, instead of a few monolithic
structures. While this practice is beneficial from a program design
perspective, it can slow down program execution by incurring many
pointer indirections. One solution to this problem is object inlining:
when the compiler can safely do so, it fuses small objects together,
thus removing the reads/writes to the removed field, saving the memory
needed to store the field and object header, and reducing the number of
object allocations.
 
The objective of this paper is to measure the potential for object inlining
by studying the run-time behaviour of a comprehensive set of Java programs.
We study the traces of program executions in order to determine which
fields behave like inlinable fields.   Since we are using dynamic information
instead of a static analysis,  our results give an upper bound on what
could be achieved via a static compiler-based approach.
Our experimental results measure the potential improvements
attainable with object inlining, including reductions in the numbers of
field reads and writes, and reduced memory usage. 
Our study shows that some Java programs can benefit significantly
from object inlining, with close to a 10% speedup.   Somewhat to our
surprise, our study found one case, the db benchmark, 
where the most important inlinable field was the result of unusual 
program design, and fixing this small flaw led to both better performance
and clearer program design.  However, the opportunities for 
object inlining are highly dependent on the individual program being
considered, and are in many
cases very limited.   Furthermore,  fields that are inlinable also have
properties that make them potential candidates for other optimizations such
as removing redundant memory accesses. 
The memory savings possible through object inlining are moderate. 
 
  
| ISMM2002: An Adaptive, Region-based Allocator for Java | back |  
Authors:  Feng Qian and Laurie HendrenDate: April 22, 2002
 ISMM'02, June 2002, Berlin, Germany
 
 Abstract This paper introduces an adaptive, region-based allocator for Java.
The basic idea is to allocate non-escaping objects in local regions,
which are allocated and freed in conjunction with their associated
stack frames.  By releasing memory associated with these stack frames,
the burden on the garbage collector is reduced, possibly resulting in
fewer collections.
 
The novelty of our approach is that it does not require static escape
analysis, programmer annotations, or special type systems.  The
approach is transparent to the Java programmer and relatively simple
to add to an existing JVM.  The system starts by assuming that all
allocated objects are local to their stack region, and then catches
escaping objects via write barriers.  When an object is caught
escaping, its associated allocation site is marked as a non-local
site, so that subsequent allocations will be put directly in the
global region.  Thus, as execution proceeds, only those allocation
sites that are likely to produce non-escaping objects are allocated to
their local stack region. 
The paper presents the overall idea, and then provides details of a
specific design and implementation.  In particular, we present a
region-based allocator and the necessary modifications of the Jikes RVM
baseline JIT and a copying collector.  Our experimental study
evaluates the idea using the SPEC JVM98 benchmarks, plus one other large
benchmark.  We show that a region-based allocator is a reasonable
choice, that overheads can be kept low, and that the adaptive system
is successful at finding local regions that contain no escaping
objects. 
 
  
| CC2002: Decompiling Java Bytecode: Problems, Traps and Pitfalls | back |  
Authors:  Jerome Miecznikowski and Laurie HendrenDate: February 2002
 CC'02, April 2002, Grenoble France
 
 Abstract Java virtual machines execute Java bytecode instructions. Since this
bytecode is a higher level representation than traditional object code, it
is possible to decompile it back to Java source.  Many such decompilers
have been developed and the conventional wisdom is that decompiling Java
bytecode is relatively simple.  This may be true when decompiling bytecode
produced directly from a specific compiler, most often Sun's javac
compiler.  In this case it is really a matter of inverting a known
compilation strategy.  However, there are many problems, traps and
pitfalls when decompiling arbitrary verifiable Java bytecode.  Such
bytecode could be produced by other Java compilers, Java bytecode
optimizers or Java bytecode obfuscators.  Java bytecode can also be
produced by compilers for other languages, including Haskell, Eiffel, ML,
Ada and Fortran.  These compilers often use very different code generation
strategies from javac.
 
 
This paper outlines the problems and solutions we have found in our
development of Dava, a decompiler for arbitrary Java bytecode.  We first
outline the problems in assigning types to variables and literals, and the
problems due to expression evaluation on the Java stack.  Then, we look at
finding structured control flow with a particular emphasis on how to deal
with Java exceptions and synchronized blocks.  Throughout the paper we
provide small examples which are not properly decompiled by commonly used
decompilers.
 
  
Authors: Feng Qian, Laurie Hendren and Clark VerbruggeDate: February 2002
 CC'02, April 2002, Grenoble France
 
 Abstract This paper reports on a comprehensive approach to eliminating array
bounds checks in Java.  Our approach is based upon three analyses.  The
first analysis is a flow-sensitive
intraprocedural analysis called variable constraint analysis
(VCA).  This analysis builds a small constraint graph for each
important point in a method, and then uses the information encoded in
the graph to infer the relationship between array index expressions
and the bounds of the array.  Using VCA as the base analysis, we also
show how two further analyses can improve the results of VCA.  
Array field analysis is applied on each class and provides
information about some arrays stored in fields, while rectangular
array analysis is an interprocedural analysis to approximate the
shape of arrays, and is useful for finding rectangular (non-ragged)
arrays.
 
We have implemented all three analyses using the Soot bytecode
optimization/annotation framework and we transmit the results of the
analysis to virtual machines using class file attributes.  We have
modified the Kaffe JIT, and IBM's High Performance Compiler for Java
(HPCJ) to make use of these attributes, and we demonstrate significant
speedups. 
  
Authors: Jerome Miecznikowski and Laurie HendrenDate: October 2001
 
 Abstract This paper presents an approach to program structuring for use in
decompiling Java bytecode to Java source.  The structuring approach uses
three intermediate representations:  (1) a list of typed, aggregated
statements with an associated exception table, (2) a control flow graph,
and (3) a structure encapsulation tree.
 
The approach works in six distinct stages, with each stage focusing on a
specific family of Java constructs, and each stage contributing more
detail to the structure encapsulation tree.  After completion of all
stages the structure encapsulation tree contains enough information to
allow a simple extraction of a structured Java program.  
The approach targets general Java bytecode including bytecode that may be
the result of front-ends for languages other than Java, and also bytecode
that has been produced by a bytecode optimizer.  Thus, the techniques have
been designed to work for bytecode that may not exhibit the typical
structured patterns of bytecode produced by a standard Java compiler. 
The structuring techniques have been implemented as part of the Dava
decompiler which has been built using the Soot framework.
 
  
Authors: Patrice Pominville, Feng Qian, Raja Vallée-Rai, Laurie Hendren and Clark VerbruggeDate: November 2000
 
 Abstract This paper presents a framework for supporting the optimization of Java programs using attributes in Java class files. We show how
class file attributes may be used to convey both optimization opportunities and profile information to a variety of Java virtual machines
including ahead-of-time compilers and just-in-time compilers.
 
We present our work in the context of Soot, a framework that supports the analysis and transformation of Java bytecode (class files).
We demonstrate the framework with attributes for elimination of array bounds and null pointer checks, and we provide experimental
results for the Kaffe just-in-time compiler, and IBM's High Performance Compiler for Java ahead-of-time compiler. 
 
 
Winner of the "best paper that is primarily the work of a student" award.Authors: Etienne Gagnon and Laurie Hendren
 Date: April 2001
 Conference: Java Virtual Machine Research and Technology Symposium (JVM '01)
 
 Abstract SableVM is an open-source virtual machine for Java
intended as a research framework for efficient 
execution of Java bytecode.
The framework is essentially composed
of an extensible bytecode interpreter using state-of-the-art
and innovative techniques.  
Written in the C programming language, and assuming
minimal system dependencies, the interpreter emphasizes high-level
techniques to support efficient execution.
 
In particular, we introduce a  bidirectional layout for object
instances that groups reference fields sequentially to allow 
efficient garbage collection. We also introduce
a  sparse interface virtual table layout that reduces the cost 
of interface method calls to that of normal virtual calls.
Finally, we present a technique to improve thin locks
by eliminating busy-wait in presence of contention.
 
 
Authors: Vijay Sundaresan, Laurie Hendren, Chrislain Razafimahefa, Raja Vallée-Rai, Patrick Lam, Etienne Gagnon, and Charles GodinDate: October 2000
 
 Abstract This paper addresses the problem of resolving virtual method and 
interface calls in Java bytecode.  
The main focus is on a new practical technique that can
be used to analyze large applications.
Our fundamental design goal was to develop a technique that can be solved
with only one iteration, and thus scales linearly with the size of the
program, 
while at the same time providing 
more accurate results than two popular existing linear techniques,
 class hierarchy analysis and  rapid type analysis.
 
We present two variations of our new technique, variable-type analysis
and a coarser-grain version called declared-type analysis.   
Both of these analyses are inexpensive, easy to implement, 
and our experimental results show that they scale linearly in 
the size of the program.
 
We have implemented our new analyses
using the Soot framework, and we report on
empirical results for seven benchmarks.
We have used our techniques to build
accurate call graphs for complete applications (including libraries)
and we show that compared to a conservative call graph built
using class hierarchy analysis, our new variable-type analysis 
can remove a significant number of nodes (methods) and call edges.
Further, our results show that we can improve upon the compression obtained
using rapid type analysis.
 
We also provide dynamic measurements of monomorphic call sites, focusing
on the benchmark code excluding libraries.   We demonstrate that when
considering only the benchmark code,
both rapid type analysis and our new declared-type analysis do not add much
precision over class hierarchy analysis.  However, our finer-grained
variable-type analysis does resolve significantly more
call sites, particularly for programs with more complex uses of objects.
 
  
Authors: Patrice Pominville, Feng Qian, Raja Vallée-Rai, Laurie Hendren and Clark VerbruggeDate: November 2000
 
 Abstract This paper presents a framework for supporting the optimization of Java programs using attributes in Java class files. We show how
class file attributes may be used to convey both optimization opportunities and profile information to a variety of Java virtual machines
including ahead-of-time compilers and just-in-time compilers.
 
We present our work in the context of Soot, a framework that supports the analysis and transformation of Java bytecode (class files).
We demonstrate the framework with attributes for elimination of array bounds and null pointer checks, and we provide experimental
results for the Kaffe just-in-time compiler, and IBM's High Performance Compiler for Java ahead-of-time compiler. 
 
 
Authors: Etienne Gagnon, Laurie Hendren and Guillaume MarceauDate: June-July 2000
 
 Abstract Even though Java bytecode has a significant amount of type
information embedded in it,
there are no explicit types for local variables.  
However, knowing types for local variables is very useful
for both program optimization and decompilation.   
In this paper, we present an efficient and practical 
algorithm for inferring static types for local variables 
in a 3-address, stackless, representation of Java bytecode.
 
By decoupling the type inference problem from the
low level bytecode representation, and abstracting it into a
 constraint system, we show that there exists verifiable
bytecode that cannot be statically typed.  Further, we show that,
without transforming the program,  the static typing problem
is NP-hard.   In order to develop a practical approach we
have developed an algorithm that works efficiently for the
usual cases and then applies efficient program transformations to
simplify the hard cases. 
Our solution is an multi-stage algorithm.  
In the first stage, we
propose an efficient algorithm that infers static types for most
bytecode found in practice.  In  case this 
stage fails, the second stage is applied.  It consists of a simple
and efficient variable splitting operation that renders 
most bytecode typeable using the algorithm of stage
one.  Finally, for completeness of the algorithm, we present a
final stage that efficiently transforms and infers types for all 
remaining bytecode (such bytecode is likely to be a contrived example,
and not code produced from a compiler). 
We have implemented this algorithm in the Soot framework.  Our
experimental results show that all of the 17,000 methods used
in our tests were successfully typed,  99.8% of those required only
the first stage, 0.2% required the second stage, and no methods
required the third stage.  
 
 
Authors: Raja Vallée-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay SundaresanDate: March-April 2000
 
 Abstract 
 This paper presents Soot, a framework for optimizing JavaTM bytecode.  The
framework is implemented in Java and supports three intermediate representations
for representing Java bytecode: Baf, a streamlined representation of Java's
stack-based bytecode;
Jimple, a typed three-address intermediate
representation suitable for optimization; and Grimp, an aggregated version of
Jimple. 
 Our approach to class file optimization is to first convert the stack-based
bytecode into Jimple, a three-address form more amenable to traditional program
optimization,  and then convert the optimized Jimple back to bytecode.
 In order to demonstrate that our approach is feasible,  we present
experimental results showing the effects of processing class files through
our framework.  In particular, we study the techniques necessary to effectively
translate Jimple back to bytecode, without losing performance.   Finally, we
demonstrate that class file optimization can be quite effective by
showing the results of some basic optimizations using our framework.   
Our experiments
were done on ten benchmarks, including seven SPECjvm98 benchmarks,  and were
executed on five different Java virtual machine implementations.
 
 
Authors: Raja Vallée-Rai, Laurie Hendren, Vijay Sundaresan, Patrick Lam, Etienne Gagnon and Phong CoDate: September 99
 
 Abstract This paper presents Soot, a framework for optimizing Java(tm) bytecode.  The
framework is implemented in Java and supports three intermediate representations
for representing Java bytecode: Baf, a streamlined representation of bytecode
which is simple to manipulate; Jimple, a typed 3-address intermediate
representation suitable for optimization; and Grimp, an aggregated version of
Jimple suitable for decompilation.  We describe the motivation for each
representation, and the salient points in translating from one representation to
another.
 
In order to demonstrate the usefulness of the framework, we have implemented
intraprocedural and whole program optimizations.  To show that whole program
bytecode optimization can give performance improvements, we provide experimental
results for 12 large benchmarks, including 8 SPECjvm98 benchmarks running on JDK
1.2 for GNU/Linux(tm).  These results show up to 8% improvement when the
optimized bytecode is run using the interpreter and up to 21% when run using the
JIT compiler.
 
  
| TOOLS98: SableCC, an Object-Oriented Compiler Framework | back |  
Authors: Etienne Gagnon and Laurie J. HendrenDate: August 1998
 
 Abstract In this paper, we introduce SableCC, an object-oriented framework that generates
compilers (and interpreters) in the Java programming language.  This framework is based on
two fundamental design decisions.  Firstly, the framework uses object-oriented techniques
to automatically build a strictly-typed abstract syntax tree that matches the grammar of
the compiled language which simplifies debugging.  Secondly, the framework generates tree-walker
classes using an extended version of the visitor design pattern which enables the
implementation of actions on the nodes of the abstract syntax tree using inheritance.  These
two design decisions lead to a tool that supports a shorter development cycle for constructing
compilers.
 
To demonstrate the simplicity of the framework, we present all the steps of building an
interpreter for a mini-BASIC language.  This example could easily be modified to provide
an embedded scripting language in an application.  We also provide a brief description of
larger systems that have been implemented the SableCC tool.
 
We conclude that the use of object-oriented techniques significantly reduces the length of
the programmer written code, can shorten the development time and finally, makes the code
easier to read and maintain.
 
 
 
 Last updated Fri Apr 11 23:53:33 EDT 2003.
 |