Sable Research Group

Sable Home
Main Page
People
Projects
Publications
Software
Internal
Links

Publications
Papers
Theses
Posters
Reports
Notes

Sable Publications (Technical Reports)

Efficient Web-Based Parallel Sparse Matrix-Vector Multiplication

(2021-4)

Authors: Prabhjot Sandhu, Clark Verbrugge and Laurie Hendren
Date: April 2021

Abstract
Sparse Matrix-Vector Multiplication (SpMV) is a computational kernel of a wide range of scientific computations, including popular targets like machine learning. Web-based implementations simplify access to such computations, but are limited by the performance of web-based programming languages. In this work we describe a scalable and highly optimized parallel SpMV implementation based on WebAssembly. Our design builds on a stack of optimizations, exploiting different sparse storage formats as well as novel matrix properties and machine-level performance characteristics. We perform exhaustive experiments with 2000 real-life sparse matrices to evaluate the effect of our optimizations and performance relative to a static, C baseline. Using our stack of optimizations, we achieve similar performance to the Intel MKL C library, showing that a web-based design can offer performance competitive with even highly tuned and well established native implementations.
View the report (.pdf) BibTeX entry (.bib)

Accelerating Database Queries for Advanced Data Analytics: A New Approach

(2021-1)

back

Authors: Hanfeng Chen, Joseph Vinish D'silva, Laurie Hendren and Bettina Kemme
Date: Feburary 2021

Abstract
The rising popularity of data science in recent times has resulted in the diversification of data processing systems. The current ecosystem of data processing software consists of conventional database implementations, traditional numerical computational systems, and more recent efforts that build a hybrid of these two systems. As many organizations are building complex applications that integrate all the three types of data processing systems, there is a need to look at a holistic optimization strategy that can work with any of the three, or their combinations. In this paper, we propose an advanced analytical system HorsePower, based on HorseIR, an array-based intermediate representation (IR). The system is designed for the translation of conventional database queries, statistical languages, as well as the mix of these two into a common IR, allowing to combine query optimization and compiler optimization techniques at an intermediate level of abstraction. Our experiments compare HorsePower with the column-based database system MonetDB and the array programming language MATLAB, and show that we can achieve significant speedups for standard SQL queries, for analytical functions written in MATLAB and for advanced data analytics combining queries and UDFs. The results show a promising new direction for integrating advanced data analytics into database systems by using a holistic compilation approach and exploiting a wide range of compiler optimization techniques.
View the report (.pdf) BibTeX entry (.bib)

Sparse matrices on the web -- Characterizing the performance and optimal format selection of sparse matrix-vector multiplication in JavaScript

(2018-4)

back

Authors: Prabhjot Sandhu, David Herrera and Laurie Hendren
Date: January 2018

Abstract
JavaScript is the most widely used language for web programming, and now increasingly becoming popular for high performance computing, data-intensive applications, and deep learning. Sparse matrix-vector multiplication (SpMV) is an important kernel that is considered critical for the perfor- mance of those applications. In SpMV, the optimal selection of storage format is one of the key aspects of developing effective applications. This paper describes the distinctive nature of the performance and choice of optimal sparse matrix storage format for sequential SpMV in JavaScript as compared to native languages like C. Based on exhaustive experiments with 2000 real-life sparse matrices, we explored three main research questions. First, we examined the difference in performance between native C and JavaScript for the two major browsers, Firefox and Chrome. We observed that the best performing browser demonstrated a slowdown of only 1.2x to 3.9x, depending on the choice of sparse storage format. Second, we explored the performance of single-precision versus double-precision SpMV. In contrast to C, in JavaScript, we found that double-precision is more efficient than single-precision. Finally, we examined the choice of optimal storage format. To do this in a rigorous manner we introduced the notion of x%-affinity which allows us to identify those formats that are at least x% better than all other formats. Somewhat surprisingly, the best format choices are very different for C as compared to JavaScript, and even quite different between the two browsers.
View the report (.pdf) BibTeX entry (.bib)

HorseIR: Fusing Array Programming and Database Query Processing

(2018-3)

back

Authors: Hanfeng Chen, Joseph Vinish D'silva, Hongji Chen, Bettina Kemme and Laurie Hendren
Date: January 2018

Abstract
While traditional relational database management systems (RDBMS) seem to be a natural choice for data storage and analysis in the area of Data Science, current systems still fall short in two aspects. First, with increasingly cheap main memory, many workloads now fit into main memory while RDBMS have been optimized for I/O. Second, while current systems support advanced analytics beyond SQL queries through user-defined functions (UDF) written in procedural languages, integration into the SQL engine mostly follows a black-box approach limiting the opportunities for a holistic optimization. In this paper, we propose HorseIR, an array-based intermediate representation that allows for a unified representation of UDFs and SQL execution plans optimized with traditional RDBMS optimization techniques. HorseIR has a high-level design, supports rich types and data structures, including homogeneous vectors and heterogeneous lists. We identify suitable optimizations for generating efficient code from HorseIR, taking memory and CPU aspects into account. We compare HorseIR with the MonetDB RDBMS, by testing both standard SQL queries and queries with UDFs, and show how our holistic approach and compiler optimizations benefit the runtime of complex queries.
View the report (.pdf) BibTeX entry (.bib)

WebAssembly and JavaScript Challenge:
Numerical program performance using modern browser technologies and devices

(2018-2)

back

Authors: David Herrera, Hanfeng Chen, Erick Lavoie and Laurie Hendren
Date: January 2018

Abstract
Recent advances in execution environments for JavaScript and WebAssembly that run on a broad range of devices, from workstations to IoT devices, provides new opportunities for portable and web-based numerical computing. The aim of this paper is to evaluate the current state of the art through a comprehensive experiment using the Ostrich benchmark suite, a collection of numerical programs representing the numerical dwarf categories. Five research questions evaluate the improvement of JavaScript-based browser engines, the relative performance of JavaScript and WebAssembly, the relative performance of portable versus vendor-specific browsers, the relative performance of server-side versus client-side JavaScript/WebAssembly, and an overall comparison to find the best performing browser/language and the best performing device.
View the report (.pdf) BibTeX entry (.bib)

A Formalization for Specifying and Implementing Correct Pull-Stream Modules

(2018-1)

back

Authors: Erick Lavoie, Laurie Hendren
Date: January 2018

Abstract
Pull-stream is a JavaScript demand-driven functional design pattern based on callback functions that enables the creation and easy composition of independent modules that are used to create streaming applications. It is used in popular open source projects and the community around it has created over a hundred compatible modules. While the description of the pull-stream design pattern may seem simple, it does exhibit complicated termination cases. Despite the popularity and large uptake of the pull-stream design pattern, there was no existing formal specification that could help programmers reason about the correctness of their implementations. Thus, the main contribution of this paper is to provide a formalization for specifying and implementing correct pull-stream modules based on the following: (1) we show the pull-stream design pattern is a form of declarative concurrent programming; (2) we present an event-based protocol language that supports our formalization, independently of JavaScript; (3) we provide the first precise and explicit definition of the expected sequences of events that happen at the interface of two modules, which we call the pull-stream protocol; (4) we specify reference modules that exhibit the full range of behaviors of the pull-stream protocol; (5) we validate our definitions against the community expectations by testing the existing core pull-stream modules against them and identify unspecified behaviors in existing modules. Our approach helps to better understand the pull-stream protocol, to ensure interoperability of community modules, and to concisely and precisely specify new pull-stream abstractions in papers and documentation.
View the report (.pdf) BibTeX entry (.bib)

Asymmetric Partitioning in Thread-Level Speculation

(2015-2)

back

Authors: Alexander Krolik, Clark Verbrugge
Date: November 2015

Abstract
Software-based Thread-Level Speculation (TLS) requires speculative threads executing ahead of normal execution be isolated from main memory until validation. The resulting read/write buffering requirements, however, mean speculative execution proceeds slower than unbuffered, non-speculative execution, resulting in poor parallel coverage of loops as the non-speculative threads catches up to and prematurely joins with its speculative children. In this work we investigate an "asymmetric partitioning" strategy, modifying speculative code generation to better partition loops, balancing the load assuming a range of constant factor slowdowns on speculative threads. An implementation within the LLVM-based MUTLS system shows a significant benefit to memory intensive benchmarks is possible, although it is dependent on relatively precise estimates of the memory access rate that induces buffering slowdown for individual benchmarks.
View the report (.pdf) BibTeX entry (.bib)

Halophile: Comparing PNacl to Other Web Technologies

(2015-1)

back

Authors: Lei Lopez
Date: May 2015

Abstract
Most modern web applications are written in JavaScript. However, the demand for web applications that require more numerically-intensive calculations, such as 3D gaming or photo-editing, has increased. This has also increased the demand for code that runs near native speeds. PNaCl is a toolchain that allows native C/C++ code to be run in the browser. This paper provides a comparison of the performance of PNaCl to native code and JavaScript. Using a benchmark suite that covers a representative set of numerical computations, it is shown on average, that the performance PNaCl is within 9% of native C code.
View the report (.pdf) BibTeX entry (.bib)

McTutorial: A Structured Approach to Teaching MATLAB

(2014-3)

back

Authors: Lei Lopez
Date: August 2014

Abstract
Learning how to program has increasingly become a more important skill for non-programmers in the tech industry or researchers outside of computer science. Newer programming languages such as MATLAB language have grown into industrial strength languages, and many industries and academic fields outside of math and computer science have found uses for it. Thus, it is essential for many more people to learn MATLAB. However, many people often learn it without fundamental knowledge in programming concepts that are rooted in computer science. Thus, a new tutorial, McTutorial addresses this issue.
View the report (.pdf) BibTeX entry (.bib)

Using JavaScript and WebCL for Numerical Computations: A Comparative Study of Native and Web Technologies

(2014-2)

back

Authors: Faiz Khan, Vincent Foley-Bourgon, Sujay Kathrotia, Erick Lavoie, Laurie Hendren
Date: June 2014

Abstract
From its modest beginnings as a tool to validate forms, JavaScript is now an industrial-strength language used to power online applications such as spreadsheets, IDEs, image editors and even 3D games. Since all modern web browsers support JavaScript, it provides a medium that is both easy to distribute for developers and easy to access for users. This paper provides empirical data to answer the question: Is JavaScript suitable for numerical computations? By measuring and comparing the runtime performance of benchmarks representative of a wide variety of scientific applications, we show that for sequential JavaScript is within a factor of 2 of native code. Parallel code using WebCL shows speed improvements of up to 2.28 over JavaScript for the majority of the benchmarks.
View the report (.pdf) BibTeX entry (.bib)

Velociraptor: A compiler toolkit for numerical programs targeting CPUs and GPUs

(2013-5)

back

Authors: Rahul Garg and Laurie Hendren
Date: November 2013

Abstract
Developing compilers that allow scientific programmers to use multicores and GPUs is of increasing interest, however building such compilers requires considerable effort. We present Velociraptor: a portable compiler toolkit that can be used to easily build compilers for numerical programs targeting multicores and GPUs. Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for high-level parallel and accelerator constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor does the rest of the work by optimizing the VRIR code, and generating LLVM for CPUs and OpenCL for GPUs. Velociraptor also provides a smart runtime system to manage GPU resources and task dispatch. To demonstrate Velociraptor in action, we present two case studies: a proof-of-concept Python compiler targeting CPUs and GPUs, and a GPU extension for a MATLAB JIT.
View the report (.pdf) BibTeX entry (.bib)

Mc2For: a tool for automatically transforming MATLAB to Fortran 95

(2013-4)

back

Authors: Xu Li and Laurie Hendren
Date: October 2013

Abstract
MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB's high-level syntax and dynamic types makes it ideal for prototyping, programmers often prefer using high-performance static programming languages such as Fortran for their final distributable code. Rather than requiring programmers to rewrite their code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to produce an equivalent Fortran program. There are several important challenges for automatically translating MATLAB to Fortran, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to Fortran constructs. In this paper, we introduce Mc2For, a tool which automatically translates MATLAB to Fortran. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as array shape and the range value information, which are used to generate variable declarations in the translated Fortran program. The second part is an extensible Fortran code generation framework to automatically transform MATLAB constructs to corresponding Fortran constructs. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated Fortran code for a collection of MATLAB benchmark programs.
View the report (.pdf) BibTeX entry (.bib)

First steps to compiling MATLAB to X10

(2013-2)

back

Authors: Vineet Kumar and Laurie Hendren
Date: May 2013

Abstract
MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers, who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are potentially very good applications for high-performance languages such as X10. To provide a bridge between MATLAB and X10, we are developing MiX10, a source-to-source compiler that translates MATLAB to X10. This paper provides an overview of the initial design of the MiX10 compiler, presents a template-based specialization approach to compiling the builtin MATLAB operators, and provides translation rules for the key sequential MATLAB constructs with a focus on those which are challenging to convert to semantically-equivalent X10. An initial core compiler has been implemented, and preliminary results are provided.
View the report (.pdf) BibTeX entry (.bib)

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

(2013-1)

back

Authors: Rahul Garg and Laurie Hendren
Date: April 2013

Abstract
High-performance computing systems today include a variety of compute devices such as multi-core CPUs, GPUs and many-core accelerators. OpenCL allows programming different types of compute devices using a single API and kernel language. However, there is no standard matrix operations library in OpenCL for operations such as matrix multiplication that works well on a variety of hardware from multiple vendors. We implemented an OpenCL auto-tuning library for real and complex variants of general matrix multiply (GEMM) and present detailed performance results and analysis on a variety of GPU and CPU architectures. The library provides good performance on all the architectures tested, and is competitive with vendor libraries on some architectures (such as AMD Radeons) We also present brief analysis for related kernels such as matrix transpose and matrix-vector multiply.
View the report (.pdf) BibTeX entry (.bib)

Optimizing MATLAB feval with Dynamic Techniques (2012-6-rev1) back

Authors: Nurudeen Lameed and Laurie Hendren
Date: March 2013

Abstract

MATLAB is a popular dynamically-typed array-based language. The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Programmers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this paper, we show that although feval provides an acceptable abstraction mechanism for these types of applications, there are significant performance overheads for function calls via feval, in both MATLAB interpreters and JITs. The paper then proposes, implements and compares two on-the-fly mechanisms for specialization of feval calls. The first approach specializes calls of functions with feval using a combination of runtime input argument types and values. The second approach uses on-stack replacement technology, as supported by McVM/McOSR. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.
View the report (.pdf) BibTeX entry (.bib)

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

(2012-3)

back

Authors: Rahul Garg and Laurie Hendren
Date: November 2012

Abstract
Superceded by newer report (2013-5), see above.

Taming Matlab

(2012-2)

back

Authors: Anton Dubrau and Laurie Hendren
Date: April 2012

Abstract
Matlab is a dynamic scientific language used by scientists, engineers and students world- wide. Although Matlab is very suitable for rapid prototyping and development, Matlab users often want to convert their final Matlab programs to a static language such as Fortran. This paper presents an extensible object-oriented toolkit for supporting the generation of static programs from dynamic Matlab programs. Our open source toolkit, called the Matlab Tamer, identifies a large tame subset of Matlab, supports the generation of a specialized Tame IR for that subset, provides a principled approach to handling the large number of builtin Matlab functions, and supports an extensible interprocedural value analysis for estimating Matlab types and call graphs.
View the report (.pdf) BibTeX entry (.bib)

A Modular Approach to On-Stack Replacement in LLVM

(2012-1-rev2)

back

Authors: Nurudeen Lameed and Laurie Hendren
Date: April 2012

Abstract
On-stack replacement (OSR) is a technique which allows a virtual machine to interrupt running code during the execution of a function/method, to reoptimize the function on-the-fly using an optimizing JIT compiler, and then to resume the interrupted function at the point and state at which it was interrupted. OSR is particularly useful programs with potentially long-running loops, as it allows dynamic optimization of those loops as soon as they become hot.
In this paper, we present a modular approach to implementing on-stack replacement that can be used by any system that targets the LLVM SSA intermediate representation, and we demonstrate the approach by using it to support dynamic inlining in McVM. McVM is a virtual machine for MATLAB which uses a LLVM-based JIT compiler. MATLAB is a popular dynamic language for scientific and engineering applications which typically manipulate large matrices and often contain long-running loops, and is thus an ideal target for dynamic JIT compilation and OSRs.
Using our McVM example, we examine the overheads of using OSR on a suite of MATLAB benchmarks, and we show the potential performance improvements when using it to perform dynamic inlining.
View the report (.pdf) BibTeX entry (.bib)

Taming Matlab

(2011-4)

back

Authors: Anton Dubrau and Laurie Hendren
Date: December 2011

Abstract
Matlab is a dynamic scientific language used by scientists, engineers and students world- wide. Although Matlab is very suitable for rapid prototyping and development, Matlab users often want to convert their final Matlab programs to a static language such as Fortran. This paper presents an extensible object-oriented toolkit for supporting the generation of static programs from dynamic Matlab programs. Our open source toolkit, called the Matlab Tamer, identifies a large tame subset of Matlab, supports the generation of a specialized Tame IR for that subset, provides a principled approach to handling the large number of builtin Matlab functions, and supports an extensible interprocedural value analysis for estimating Matlab types and call graphs.
View the report (.pdf) BibTeX entry (.bib)

Abstract Analysis of Method-Level Speculation

(2011-3)

back

Authors: Clark Verbrugge and Allan Kielstra and Christopher J.F. Pickett
Date: September 2011

Abstract
Thread-level Speculation (TLS) is a technique for automatic parallelization that has shown excellent results in hardware simulation studies. Existing studies, however, typically require a full stack of analyses, hardware components, and performance assumptions in order to demonstrate and measure speedup, limiting the ability to vary fundamental choices and making basic design comparisons difficult. Here we approach the problem analytically, abstracting several variations on a general form of TLS (method-level speculation) and using our abstraction to model the performance of TLS on common coding idioms. Our investigation is based on exhaustive exploration, and we are able to show how optimal performance is strongly limited by program structure and core choices in speculation design, irrespective of data dependencies. These results provide new, high-level insight into where and how thread-level speculation can and should be applied in order to produce practical speedup.
View the report (.pdf) BibTeX entry (.bib)

Refactoring MATLAB

(2011-2)

back

Authors: Soroush Radpour and Laurie Hendren
Date: October 2011

Abstract
MATLAB is a very popular dynamic "scripting" language for numerical computations used by scientists, engineers and students world-wide. MATLAB programs are often developed incrementally using a mixture of MATLAB scripts and functions and frequently build upon existing code which may use outdated features. This results in programs that could benefit from refactoring, especially if the code will be reused and/or distributed. Despite the need for refactoring there appear to be no MATLAB refactoring tools available. Furthermore, correct refactoring of MATLAB is quite challenging because of its non-standard rules for binding identifiers. Even simple refactorings are non-trivial.
This paper presents the important challenges of refactoring MATLAB along with automated techniques to handle a collection of refactorings for MATLAB functions and scripts including: function and script inlining, converting scripts to functions, and converting dynamic feval calls to static function calls. The refactorings have been implemented using the MATLAB compiler framework, and an evaluation is given on a large set of MATLAB benchmarks which demonstrates the effectiveness of our approach.
View the report (.pdf) BibTeX entry (.bib)

McSAF: A Static Analysis Framework for MATLAB

(2011-1)

back

Authors: Jesse Doherty and Laurie Hendren
Date: December 2011

Abstract
MATLAB is an extremely popular programming language used by scientists, engineers, researchers and students world-wide. Despite its popularity, it has received very little attention from compiler researchers. This paper introduces McSAF, an open-source static analysis framework which is intended to enable more compiler research for MATLAB and extensions of MATLAB. The framework is based on an intermediate representation (IR) called McLAST, which has been designed to capture all the key features of MATLAB, while at the same time as being simple for program analysis. The paper describes both the IR and the procedure for creating the IR from the higher-level AST. The analysis framework itself provides visitor-based traversals including fixed-point-based traversals to support both forwards and backwards analyses. McSAF has been implemented as part of the McLAB project, and the framework has already been used for a variety of analyses, both for MATLAB and the AspectMATLAB extension.
View the report (.pdf) BibTeX entry (.bib)

The Importance of Being Extendable, A Crucial Extension for Aspect-Matlab

(2010-7)

back

Authors: Olivier Savary Belanger
Date: September 2010

Abstract
AspectMatlab is an aspect-oriented extension of the MATLAB programming language. In this paper we present some important extensions to the original implementation of AspectMatlab. These extensions enhance the expressiveness of the aspect pattern definition and widen support to certain MATLAB native functions. Of particular importance are the new operator patterns, which permit matching on binary operators such as "*" and "+". Correct support for the MATLAB "clear" command and the "end" expression were also added. Finally, we documented previously hidden features, most notably a negation operator on pattern. This new extended version of MATLAB thus supports important added functionality.
View the report (.pdf) BibTeX entry (.bib)

McFLAT: A Profile-based Framework for Loop Analysis and Transformations

(2010-6)

back

Authors: Amina Aslam and Laurie Hendren
Date: July 2010

Abstract
Parallelization and optimization of the MATLAB programming language presents several challenges due to the dynamic nature of MATLAB. Since MATLAB does not have static type declarations, neither the shape and size of arrays, nor the loop bounds are known at compile-time. This means that many standard array dependence tests and associated transformations cannot be applied straight-forwardly. On the other hand, many MATLAB programs operate on arrays using loops and thus are ideal candidates for loop transformations and possibly loop vectorization/parallelization. This paper presents a new framework,McFLAT,which uses profile-based training runs to determine likely loop-bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.
Our McFLAT framework has been implemented as part of the McLAB extensible compiler toolkit. Currently, McFLAT, is used to automatically transform ordinary MATLAB code into specialized MATLAB code with transformations applied to it. This specialized code can be executed on any MATLAB system, and we report results for four execution engines, Mathwork.s proprietary MATLAB system, the GNU Octave open-source interpreter, McLAB.s McVM interpreter and the McVM JIT. For several benchmarks, we observed significant speedups for the specialized versions, and noted that loop transformations had different impacts depending on the loop range and execution engine.
View the report (.pdf) BibTeX entry (.bib)

Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT compiler

(2010-5)

back

Authors: Nurudeen Lameed and Laurie Hendren
Date: July 2010

Abstract
MATLAB has gained widespread acceptance among engineers and scientists. Several aspects of the language such as dynamic loading and typing, safe updates, and copy semantics for arrays contribute to its appeal, but at the same time provide many challenges to the compiler and virtual machine. One such problem, minimizing the number of copies and copy checks for MATLAB programs has not received much attention. Existing MATLAB systems rely on reference-counting schemes to create copies only when a shared array representation is updated. This reduces array copies, but increases the number of runtime checks. In addition, this sort of reference-counted approach does not work in a garbage-collected system.
In this paper we present a staged static analysis approach that does not require reference counts, thus enabling a garbage-collected virtual machine. Our approach eliminates both unneeded array copies and does not require runtime checks. The first stage combines two simple, yet fast, intraprocedural analyses to eliminate unnecessary copies. The second stage is comprised of two analyses that together determine whether a copy should be performed before an array is updated: the first, necessary copy analysis, is a forward flow analysis and determines the program points where array copies are required while the second, copy placement analysis, is a backward analysis that finds the optimal points to place copies, which also guarantee safe array updates.
We have implemented our approach in the McVM JIT, which is part of a garbage-collected virtual machine for MATLAB. Our results demonstrate that there are significant overheads for both existing reference-counted and naive copy-insertion approaches. We also show that our staged approach is effective. In some cases the first stage is sufficient, but in many cases the second stage is required. Overall, our approach required no dynamic checks and successfully eliminated all unnecessary copies, for our benchmark set.
View the report (.pdf) BibTeX entry (.bib)

Adaptive Software Return Value Prediction

(2010-3)

back

Authors: Christopher J. F. Pickett and Clark Verbrugge and Allan Kielstra
Date: April 2010

Abstract
Return value prediction (RVP) is a technique for guessing the return value from a function before it actually completes, enabling a number of program optimizations and analyses. However, despite the apparent usefulness, RVP and value prediction in general have seen limited uptake in practice. Hardware proposals have been successful in terms of speed and prediction accuracy, but the cost of dedicated circuitry is high, the available memory for prediction is low, and the flexibility is negligible. Software solutions are inherently much more flexible, but can only achieve high accuracies in exchange for reduced speed and increased memory consumption. In this work we first express many different existing prediction strategies in a unification framework, using it as the basis for a software implementation. We then explore an adaptive software RVP design that relies on simple object-orientation in a hybrid predictor. It allocates predictors on a per-callsite basis instead of globally, and frees the resources associated with unused hybrid sub-predictors after an initial warmup period. We find that these techniques dramatically improve speed and reduce memory consumption while maintaining high prediction accuracy.
View the report (.pdf) BibTeX entry (.bib)

Understanding Method Level Speculation

(2010-2)

back

Authors: Christopher J. F. Pickett and Clark Verbrugge and Allan Kielstra
Date: April 2010

Abstract
Method level speculation (MLS) is an optimistic technique for parallelizing imperative programs, for which a variety of MLS systems and optimizations have been proposed. However, runtime performance strongly depends on the interaction between program structure and MLS system design choices, making it difficult to compare approaches or understand in a general way how programs behave under MLS. In this work we seek to establish a basic framework for understanding MLS designs. We first present a stack-based abstraction of MLS that encompasses major design choices such as in-order and out-of-order nesting and is also suitable for implementations. We then use this abstraction to develop the structural operational semantics for a series of progressively more flexible MLS models. Assuming the most flexible such model, we provide transition-based visualizations that reveal the speculative execution behaviour for a number of simple imperative programs. These visualizations show how specific parallelization patterns can be induced by combining common programming idioms with precise decisions about where to speculate. We find that the runtime parallelization structures are complex and non-intuitive, and that both in-order and out-of-order nesting are important for exposing parallelism. The overwhelming conclusion is that either programmer or compiler knowledge of how implicit parallelism will develop at runtime is necessary to maximize performance.
View the report (.pdf) BibTeX entry (.bib)

Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT compiler

(2010-1)

back

Authors: Nurudeen Lameed and Laurie Hendren
Date: March 2010

Abstract
MATLAB has gained widespread acceptance among engineers and scientists as a platform for programming engineering and scientific applications. Several aspects of the language such as dynamic loading and typing, safe updates, and copy semantics for arrays contribute to its appeal, but at the same time provide many challenges to the compiler and virtual machine. One such problem, minimizing the number of copies and copy checks for MATLAB programs has not received much attention. Existing MATLAB systems use a reference counting scheme to create copies only when a shared array representation is updated. This reduces array copies, but increases the number of runtime checks. In this paper we present a static analysis approach that eliminates both the unneeded array copies and the runtime checks. Our approach comprises of two analyses that together determine whether a copy should be performed before an array is updated: the first, necessary copy analysis, is a forward flow analysis and determines the program points where array copies are required while the second, copy placement analysis, is a backward analysis that finds the optimal points to place copies, which also guarantee safe array updates. We have implemented our approach as part of the McVM JIT and compared our approach to Mathwork's commercial MATLAB implementation and the open-source Octave implementation. The results show that, on our benchmark set, our approach is effective at minimizing the number of copies without requiring run-time checks.
View the report (.pdf) BibTeX entry (.bib)

AspectMatlab: An Aspect-Oriented Scientific Programming Language

(2009-3)

back

Authors: Toheed Aslam and Jesse Doherty and Anton Dubrau and Laurie Hendren
Date: October 2009

Abstract
This paper introduces a new aspect-oriented programming language, AspectMatlab. Matlab is a dynamic scientific programming language that is commonly used by scientists because of its convenient and high-level syntax for arrays, the fact that type declarations are not required, and the availability of a rich set of application libraries. AspectMatlab introduces key aspect-oriented features in a way that is both accessible to scientists and where the aspect-oriented features concentrate on array accesses and loops, the core computation elements in scientific programs. Introducing aspects into a dynamic language such as Matlab also provides some new challenges. In particular, it is difficult to statically determine precisely where patterns match, resulting in many dynamic checks in the woven code. Our compiler includes flow analyses which are used to eliminate many of those dynamic checks. This paper reports on the language design of AspectMatlab, the amc compiler implementation and related optimizations, and also provides an overview of use cases that are specific to scientific programming.
View the report (.pdf) BibTeX entry (.bib)

Understanding Method Level Speculation

(2009-2)

back

Authors: Christopher J. F. Pickett and Clark Verbrugge and Allan Kielstra
Date: September 2009

Abstract
Method level speculation (MLS) is an optimistic technique for parallelizing imperative programs, for which a variety of MLS systems and optimizations have been proposed. However, runtime performance strongly depends on the interaction between program structure and MLS system design choices, making it difficult to compare approaches or understand in a general way how programs behave under MLS. Here we develop an abstract list-based model of speculative execution that encompasses several MLS designs, and a concrete stack-based model that is suitable for implementations. Using our abstract model, we show equivalence and correctness for a variety of MLS designs, unifying in-order and out-of-order execution models. Using our concrete model, we directly explore the execution behaviour of simple imperative programs, and show how specific parallelization patterns are induced by combining common programming idioms with precise speculation decisions. This basic groundwork establishes a common basis for understanding MLS designs, and suggests more formal directions for optimizing MLS behaviour and application.
View the report (.pdf) BibTeX entry (.bib)

Adaptive Software Return Value Prediction

(2009-1)

back

Authors: Christopher J. F. Pickett and Clark Verbrugge and Allan Kielstra
Date: June 2009

Abstract
Return value prediction (RVP) is a useful technique that enables a number of program optimizations and analyses. Potentially high overhead, however, as well as a dependence on novel hardware support remain significant barriers to method level speculation and other applications that depend on low cost, high accuracy RVP. Here we investigate the feasibility of software-based RVP through a comprehensive software study of RVP behaviour. We develop a structured framework for RVP design and use it to experimentally analyze existing and novel predictors in the context of non-trivial Java programs. As well as measuring accuracy, time, and memory overhead, we show that an object-oriented adaptive hybrid predictor model can significantly improve performance while maintaining high accuracy levels. We consider the impact on speculative parallelism, and show further application of RVP to program understanding. Our results suggest that software return value prediction can play a practical role in further program optimization and analysis.
View the report (.pdf) BibTeX entry (.bib)

A Stochastic Approach to Instruction Cache Optimization

(2008-4)

back

Authors: Maxime Chevalier-Boisvert and Clark Verbrugge
Date: December 2008

Abstract
The memory alignment of executable code can have significant yet hard to predict effects on the run-time performance of programs, even in the presence of existing aggressive optimization. We present an investigation of two different approaches to instruction cache optimization\97the first is an implementation of a well-known and established compile-time heuristic, and the second is our own stochastic, genetic search algorithm based on active profiling. Both algorithms were tested on 7 real-world programs and speed gains of up to 10% were observed. Our results show that, although more time consuming, our approach yields higher average performance gains than established heuristic solutions. This suggests there is room for improvement, and moreover that stochastic and learning-based optimization approaches like our own may be worth investigating further as a means of better exploiting hardware features.
View the report (.pdf) BibTeX entry (.bib)

Memory Abstractions for Speculative Multithreading

(2008-3)

back

Authors: Christopher J. F. Pickett and Clark Verbrugge and Allan Kielstra
Date: September 2008

Abstract
Speculative multithreading is a promising technique for automatic parallelization. However, our experience with a software implementation indicates that there is significant overhead involved in managing the heap memory internal to the speculative system and significant complexity in correctly managing the call stack memory used by speculative tasks. We first describe a specialized heap management system that allows the data structures necessary to support speculation to be quickly recycled. We take advantage of constraints imposed by our speculation model to reduce the complexity of this otherwise general producer/consumer memory problem. We next provide a call stack abstraction that allows the local variable requirements of in-order and out-of-order nested method level speculation to be expressed and hence implemented in a relatively simple fashion. We believe that heap and stack memory management issues are important to efficient speculative system design. We also believe that abstractions such as ours help provide a framework for understanding the requirements of different speculation models.
View the report (.pdf) BibTeX entry (.bib)

Enabling Static Analysis for Partial Java Programs

(2008-2)

back

Authors: Barth\E9l\E9my Dagenais and Laurie Hendren
Date: April 2008

Abstract

Software engineering tools often deal with the source code of programs retrieved from the web or source code repositories. Typically, these tools only have access to a subset of the programs' source code (one file or a subset of files) which makes it difficult to build a complete and typed intermediate representation (IR). Indeed, for incomplete object-oriented programs, it is not always possible to completely disambiguate the syntactic constructs and to recover the declared type of certain expressions because the declaration of many types and class members are not accessible.
We present a framework that performs partial type inference and uses heuristics to recover the declared type of expressions and resolve ambiguities in partial Java programs. Our framework produces a complete and typed IR suitable for further static analysis. We have implemented this framework and used it in an empirical study on four large open source systems which shows that our system recovers most declared types with a low error rate, even when only one class is accessible.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

Transforming Timeline specifications into automata for runtime monitoring (extended version)

(2008-1)

back

Authors: Eric Bodden and Hans Vangheluwe
Date: February 2008

Abstract

In runtime monitoring, a programmer specifies code to execute whenever a sequence of events occurs during program execution. Previous and related work has shown that runtime monitoring techniques can be useful in order to validate or guarantee the safety and security of running programs. Those techniques have however not been incorporated in everyday software development processes. One problem that hinders industry adoption is that the required specifications use a cumbersome, textual notation. As a consequence, only verification experts, not programmers, can understand what a given specification means and in particular, whether it is correct. In 2001, researchers at Bell Labs proposed the Timeline formalism. This formalism was designed with ease of use in mind, for the purpose of static verification (and not, as in our work, for runtime monitoring).
In this article, we describe how software safety specifications can be described visually in the Timeline formalism and subsequently transformed into finite automata suitable for runtime monitoring, using our meta-modelling and model transformation tool \At. The synthesized automata are subsequently fed into an existing monitoring back-end that generates efficient runtime monitors for them. Those monitors can then automatically be applied to Java programs.
Our work shows that the transformation of Timeline models to automata is not only feasible in an efficient and sound way but also helps programmers identify correspondences between the original specification and the generated monitors. We argue that visual specification of safety criteria and subsequent automatic synthesis of runtime monitors will help users reason about the correctness of their specifications on the one hand and effectively deploy them in industrial settings on the other hand.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

Bytecode Testing Framework for SableVM Code-copying Engine

(2007-9)

back

Authors: Gregory B. Prokopski, Etienne M. Gagnon and Christian Arcand
Date: November 2007

Abstract
Code-copying is a very attractive, simple-JIT, highly portable fast bytecode execution technique with implementation costs close to a classical interpreter. The idea is to reuse, at Virtual Machine (VM) runtime, chunks of VM binary code, copy them, concatenate and execute together at a greater speed. Unfortunatelly code-copying makes anwarranted assumptions about the code generated by the compiler used to compile the VM. It is therefore necessary to find which VM code chunks can be safely used after being copied and which can not. Previously used manual testing and assembly analysis were highly ineffective and error prone. In this work we present a test suite, Bytecode Testing Framework (BTF) that contains a comprehensive set of fine-grained Java assembly tests. These tests are then used in conjunction with SableVM JVM testing mode to semi-automatically detect one-by-one which code chunks are safe to use. With this techunique we ported SableVM\E2s code-copying engine to several architectures and had external contributors with no VM knowledge port SableVM to new architectures within as little as one hour. The presented Bytecode Testing Framework greatly improved reliability and ease of porting of SableVM's code-copying engine, thus its practical usability.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

Instance keys: A technique for sharpening whole-program pointer analyses with intraprocedural information

(2007-8)

back

Authors: Eric Bodden, Patrick Lam and Laurie Hendren
Date: September 2007

Abstract
Pointer analyses enable many subsequent program analyses and transformations, since they enable compilers to statically disambiguate references to the heap. Extra precision enables pointer analysis clients to draw stronger conclusions about programs. Flow-sensitive pointer analyses are typically quite precise. Unfortunately, flow-sensitive pointer analyses are also often too expensive to run on whole programs. This paper therefore describes a technique which sharpens results from a whole-program flow-insensitive points-to analysis using two flow-sensitive intraprocedural analyses: a must-not-alias analysis and a must-alias analysis. The main technical idea is the notion of instance keys, which enable client analyses to disambiguate object references. We distinguish two kinds of keys: weak and strong. Strong instance keys guarantee that different keys represent different runtime objects, allowing instance keys to almost transparently substitute for runtime objects in static analyses. Weak instance keys may represent multiple runtime objects, but still enable disambiguation of compile-time values. We assign instance keys based on the results of our analyses. We found that the use of instance keys greatly simplified the design and implementation of subsequent analysis phases.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

Compiler-guaranteed safety in code-copying VMs

(2007-7)

back

Authors: Gregory B. Prokopski, Clark Verbrugge
Date: September 2007

Abstract
Virtual Machine authors face a difficult choice: to settle for low performance, cheap interpreter, or to write a specialized and costly compiler. One of the methods to bridge the gap between these two distant solutions is to use the existing code-copying technique that reuses chunks of VM\E2s binary code creating a simple JIT. While simple in principle this technique is not reliable without a compiler that can guarantee that copied chunks are functionally equivalent, which is often not the case due to aggressive optimizations. We present a proof-of-concept, minimal-impact modification of a highly optimizing compiler, GCC. It allows a VM programmer to mark specific chunks of VM source code as copyable. The chunks of native code resulting from compilation of the marked source become addressable and self-contained. Chunks can be safely copied at VM runtime, concatenated and executed together. With minimal impact on compiler maintenance we guarantee the necessary safety and correctness properties of chunks. This allows code-copying VMs to safely achieve performance improvement up to 200%, 67% average, over direct interpretation. ensured thanks to chunks integrity verification. This maintanable enhancement makes the code-copying technique reliable and thus practially usable.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

Partial Program Analysis

(2007-6)

back

Authors: Barthélémy Dagenais
Date: September 2007

Abstract
Software engineers often need to perform static analysis on a subset of a program source code. Unfortunately, static analysis frameworks usually fail to complete their analyses be- cause the definitions of some types used within a partial program are unavailable and the complete program type hierarchy cannot be reconstructed. We propose a technique, Partial Program Analysis, to generate and infer type facts in incomplete Java programs to allow static analysis frameworks to complete their analyses. We describe the various levels of inference soundness our technique provides, and then, we cover the main algorithm and type inference strategies we use. We conclude with a detailed case study showing how our technique can provide more precise type facts than standard parsers and abstract syntax tree generators.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

Arithmetic Coding revealed - A guided tour from theory to praxis

(2007-5)

back

Authors: Eric Bodden, Malte Clasen and Joachim Kneis
Date: May 2007

Abstract
This document is an updated and translated version of the German paper Arithmetische Kodierung from 2002. It tries to be a comprehensive guide to the art of arithmetic coding. First we give an introduction to the mathematic principles involved. These build the foundation for the next chapters, where we describe the encoding and decoding algorithms for different numerical systems. Here we also mention various problems one can come across as well as solutions for those. This is followed by a proof of uniqueness and an estimation of the efficiency of the algorithm. In the end we briefly mention different kinds of statistical models, which are used to actually gain compression through the encoding. Throughout this paper we occasionally make some comparisons to the related Huffman encoding algorithm. Though, some rudimentary knowledge about Huffman encoding should suffice for the reader to follow the line of reasoning. This paper is mainly based on the works on Sayood and Bell. On the latter we base our implementation which is included in the appendix as full C++ source code. We also make use of parts of this code during some of our examples. The mathematical model we use, however, is strongly based on Sayood and Fano. Also we employ the well-known Shannon-Theorem, which proofs the entropy to be the bound of feasible lossless compression.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

Phase-based adaptive recompilation in a JVM

(2007-4)

back

Authors: Dayong Gu and Clark Verbrugge
Date: May 2007

Abstract
Modern JIT compilers often employ multi-level recompilation strategies as a means of ensuring the most used code is also the most highly optimized, balancing optimization costs and expected future performance. Accurate selection of code to compile and level of optimization to apply is thus important to performance. In this paper we investigate the effect of an improved recompilation strategy for a Java virtual machine. Our design makes use of a lightweight, low-level profiling mechanism to detect high-level, variable length phases in program execution. Phases are then used to guide adaptive recompilation choices, improving performance. We develop both an offline implementation based on trace data and a self-contained online version. Our offline study shows an average speedup of 8.5% and up to 21%, and our online system achieves an average speedup of 4.5%, up to 18%. We subject our results to extensive analysis and show that our design achieves good overall performance with high consistency despite the existence of many complex and interacting factors in such an environment.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

Component-Based Lock Allocation

(2007-3)

back

Authors: Richard L. Halpert and Christopher J.F. Pickett and Clark Verbrugge
Date: May 2007

Abstract
The choice of lock objects in concurrent programs can affect both performance and correctness, a burden of complexity for programmers. Recently, various automated lock allocation and assignment techniques have been proposed, each aiming primarily to minimize the number of conflicts between critical sections. However, practical performance depends on a number of important factors, including the nature of concurrent interaction, the accuracy of the program analyses used to support the lock allocation, and the underlying machine hardware. We introduce component-based lock allocation, which starts by analysing data dependences and automatically assigns lock objects with tunable granularity to groups of interfering critical sections. Our experimental results show that while a single global lock is usually suboptimal, high accuracy in program analysis is not always necessary to achieve generally good performance. Our work provides empirical evidence as to sufficient strategies for efficient lock allocation, and details the requirements of an adaptable implementation.
View the report (.pdf) BibTeX entry (.bib)

Using hardware data to detect repetitive program behavior

(2007-2)

back

Authors: Dayong Gu and Clark Verbrugge
Date: March 2007

Abstract
Detecting repetitive ``phases'' in program execution is helpful for program understanding, runtime optimization, and for reducing simulation/profiling workload. The nature of the phases that may be found, however, depend on the kinds of programs, as well how programs interact with the underlying hardware. We present a technique to detect long term and variable length repetitive program behaviour by monitoring microarchitecture-level hardware events. Our approach results in high quality repetitive phase detection; we propose quantitative methods of evaluation, and show that our design accurately calculates phases with a 92% ``confidence'' level. We further validate our design through an implementation and analysis of phase-driven, runtime profiling, showing a reduction of about 50% of the profiling workload while still preserving 94% accuracy in the profiling results. Our work confirms that it is practical to detect high-level phases from lightweight hardware monitoring, and to use such information to improve runtime performance.
View the report (.pdf) Download the report (.ps) BibTeX entry (.bib)

libspmt: A Library for Speculative Multithreading

(2007-1)

back

Authors: Christopher J.F. Pickett and Clark Verbrugge and Allan Kielstra
Date: March 2007

Abstract
Speculative multithreading (SpMT), or thread level speculation (TLS), is a dynamic parallelisation technique that uses out-of-order execution and memory buffering to achieve speedup. The complexity of implementing software SpMT results in long development lead times, and the lack of reuse between different software SpMT systems makes comparisons difficult. In order to address these problems we have created libspmt, a new language independent library for speculative multithreading. It provides a virtualization of speculative execution components that are unavailable in commercial multiprocessors, and enables method level speculation when fully integrated. We have isolated a clean and minimal library interface, and the corresponding implementation is highly modular. It can accommodate hosts that implement garbage collection, exceptions, and non-speculative multithreading. We created libspmt by refactoring a previous implementation of software SpMT for Java, and by aiming for modularity have exposed several new opportunities for optimization.
View the report (.pdf) BibTeX entry (.bib)

Obfuscating Java: the most pain for the least gain

(2006-5)

back

Authors: Micheal Batchelder and Laurie Hendren
Date: June 2006

Abstract

Software obfuscators are used to transform code so that it becomes more difficult to understand and harder to reverse engineer. Obfuscation of Java programs is particularly important since Java's binary form, Java bytecode, is relatively high-level and susceptible to high-quality decompilation. The objective of our work is to develop and study obfuscation techniques that produce obfuscated bytecode that is very hard to reverse engineer while at the same time not significantly degrading performance. We present three kinds of obfuscation techniques that: (1) obscure intent at the operational level; (2) obfuscate program structure such as complicating control flow and confusing object-oriented design; and (3) exploit the semantic gap between what is allowed at the Java source level and what is allowed at the bytecode level. We have implemented all of our techniques as a tool called JBCO (Java Byte Code Obfuscator), which is built on top of the Soot framework. We developed a benchmark suite for evaluating our obfuscations and we examine runtime performance, control flow graph complexity and the negative impact on decompilers. These results show that most of the obfuscations do not degrade performance significantly and many increase complexity, making reverse engineering more difficult. The impact on decompilers was two-fold. For those obfuscations that can be decompiled, readability is greatly reduced. Otherwise, the decompilers fail to produce legal source code or crash completely.
View the report(.pdf) View the report(.ps.gz)

Metrics for Measuring the Effectiveness of Decompilers and Obfuscators

(2006-4)

back

Authors: Nomair A. Naeem, Micheal Batchelder and Laurie Hendren
Date: June 2006

Abstract

Java developers often use decompilers to aid reverse engineering and obfuscators to prevent reverse engineering. Decompilers translate low-level class files to Java source and often produce ``good" output. Obfuscators rewrite class files into semantically-equivalent class files that are either: (1) difficult to decompile, or (2) decompilable, but produce ``hard-to-understand" Java source. This paper presents a set of metrics we have developed to quantify the effectiveness of decompilers and obfuscators. The metrics include some selective size and counting metrics, a expression complexity metric and a metric for evaluating the complexity of identifier names. We have applied these metrics to evaluate a collection of decompilers. By comparing metrics of the original Java source and the decompiled source we can show when the decompilers produced ``good" code (as compared to the original code). We have also applied the metrics on the code produced by obfuscators and show how the metrics indicate that the obfuscators produced ``hard-to-understand" code (as compared to the unobfuscated code). Our work provides a first step in evaluating the effectiveness of decompilers and obfuscators and we plan to apply these techniques to evaluate new decompiler and obfuscator tools and techniques.
View the report(.pdf) View the report(.ps.gz)

Improving the Compiling Speed of the AspectBench Compiler

(2006-3)

back

Authors: Jingwu Li and Laurie Hendren
Date: August 2006

Abstract

The AspectBench Compiler (abc) is an extensible AspectJ compiler and built based on Polyglot and Soot. It generates optimized Java bytecode which can run faster in many cases when compared with ajc. However, the compilation speed of abc can be substantially slower than ajc. Typically it is roughly 4 times slower than ajc. By using various profiling tools, we observed some bottlenecks during the process of compiling AspectJ source code. We modified some data structures related to forward and backward flow analysis and changed some algorithms such as the variable name generator and around inlining to remove the bottlenecks or relieve the severity of those bottlenecks. The experimental results on selected benchmarks shows that the combined modifications reduce overall by 8% compilation time as compared with original abc. The speed up is especially notable for the benchmarks with around advice applied many places. At the same time, we also reduce the class file size for the benchmarks with around advices applied many places greatly, around 28%.
View the report(.pdf) View the report(.ps)

Programmer-friendly Decompiled Java

(2006-2)

back

Authors: Nomair A. Naeem and Laurie Hendren
Date: March 2006

Abstract

Java decompilers convert Java class files to Java source. Java class files may be created by a number of different tools including standard Java compilers, compilers for other languages such as AspectJ, or other tools such as optimizers or obfuscators. There are two kinds of Java decompilers, javac-specific decompilers that assume that the class file was created by a standard javac compiler and tool-independent decompilers that can decompile arbitrary class files, independent of the tool that created the class files. Typically javac-specific decompilers produce more readable code, but they fail to decompile many class files produced by other tools.
This paper tackles the problem of how to make a tool-independent decompiler, Dava, produce Java source code that is programmer-friendly. In past work it has been shown that Dava can decompile arbitrary class files, but often the output, although correct, is very different from what a programmer would write and is hard to understand. Furthermore, tools like obfuscators intentionally confuse the class files and this also leads to confusing decompiled source files.
Given that Dava already produces correct Java abstract syntax trees (ASTs) for arbitrary class files, we provide a new back-end for Dava. The back-end rewrites the ASTs to semantically equivalent ASTs that correspond to code that is easier for programmers to understand. Our new back-end includes a new AST traversal framework, a set of simple pattern-based transformations, a structure-based data flow analysis framework and a collection of more advanced AST transformations that use flow analysis information. We include several illustrative examples including the use of advanced transformations to clean up obfuscated code.

View the report(.pdf) View the report(.ps) BibTeX entry

A Survey of Phase Analysis: Techniques, Evaluation and Applications

(2006-1)

back

Authors: Dayong Gu and Clark Verbrugge
Date: March 2006

Abstract

Most programs exhibit significant repetition of behaviour, and detecting program phases is an increasingly important aspect of dynamic, adaptive program optimizations. Phase-based optimization can also reduce profiling overhead, save simulation time, and help in program understanding. Using a novel state space characterization, we give an an overview of state-of-the-art phase detection and prediction techniques. We use this systematic exposition to situate our own long-range phase analysis, and evaluate our approach using a novel phase prediction evaluation metric. Our results show that phases with relatively long periods can be predicted with very high accuracy. This new technique, as well as the unexplored regions in our high level characterization suggest that further improvements and variations on program phase analysis are both possible and potentially quite useful.

View the report(.pdf) View the report(.ps) BibTeX entry

Assessing the Impact of Optimization in Java Virtual Machines

(2005-4)

back

Authors: Dayong Gu, Clark Verbrugge and Etienne M. Gagnon
Date: November 2005

Abstract

Many new Java runtime optimizations report relatively small, single-digit performance improvements. On modern virtual and actual hardware, however, the performance impact of an optimization can be influenced by a variety of factors in the underlying systems. Using a case study of a new garbage collection optimization in two different Java virtual machines, we show the various issues that must be taken into consideration when claiming an improvement. We examine the specific and overall performance changes due to our optimization and show how unintended side-effects can contribute to, and distort the final assessment. Our experience shows that VM and hardware concerns can generate variances of up to 9.5% in whole program execution time. Consideration of these confounding effects is critical to a good understanding of Java performance and optimization.

View the report(.pdf) View the report(.ps) BibTeX entry

Dynamic Shape and Data Structure Analysis in Java

(2005-3)

back

Authors: Sokhom Pheng and Clark Verbrugge
Date: October 2005

Abstract

Analysis of dynamic data structure usage is useful for both program understanding and for improving the accuracy of other program analyses. Static shape analysis techniques, however, suffer from reduced accuracy in complex situations. We have designed and implemented a dynamic shape analysis system that allows one to examine and analyze how Java programs build and modify data structures. Using a complete execution trace from a profiled run of the program, we build an internal representation that mirrors the evolving runtime data structures. The resulting series of representations can then be analyzed and visualized, and we show how to use our approach to help understand how programs use data structures, the precise effect of garbage collection, and to establish limits on static techniques for shape analysis. A deep understanding of dynamic data structures is particularly important for modern, object-oriented languages that make extensive use of heap-based data structures.

View the report (.pdf) View the report (.ps)

Context-sensitive points-to analysis: is it worth it?

(2005-2)

back

Authors: Ondřej Lhoták and Laurie Hendren
Date: October 2005

Abstract

We present the results of an empirical study evaluating the precision of subset-based points-to analysis with several variations of context sensitivity on Java benchmarks of significant size. We compare the use of call site strings as the context abstraction, object sensitivity, and the BDD-based context-sensitive algorithm proposed by Zhu and Calman, and by Whaley and Lam. Our study includes analyses that context-sensitively specialize only pointer variables, as well as ones that also specialize the heap abstraction. We measure both characteristics of the points-to sets themselves, as well as effects on the precision of client analyses. To guide development of efficient analysis implementations, we measure the number of contexts, the number of distinct contexts, and the number of distinct points-to sets that arise with each context sensitivity variation. To evaluate precision, we measure the size of the call graph in terms of methods and edges, the number of devirtualizable call sites, and the number of casts statically provable to be safe.
The results of our study indicate that object-sensitive analysis implementations are likely to scale better and more predictably than the other approaches; that object-sensitive analyses are more precise than comparable variations of the other approaches; that specializing the heap abstraction improves precision more than extending the length of context strings; and that the profusion of cycles in Java call graphs severely reduces precision of analyses that forsake context sensitivity in cyclic regions.

View the report (.pdf) View the report (.ps)

Speculative Multithreading in a Java Virtual Machine

(2005-1)

back

Authors: Christopher J.F. Pickett and Clark Verbrugge
Date: March 2005

Abstract
Speculative multithreading (SpMT) is a dynamic program parallelisation technique that promises dramatic speedup of irregular, pointer-based programs as well as numerical, loop-based programs. We present the design and implementation of software-only SpMT for Java at the virtual machine level. We take the full Java language into account and we are able to run and analyse real world benchmarks in reasonable execution times on commodity multiprocessor hardware. We provide an experimental analysis of benchmark behaviour, uncovered parallelism, the impact of return value prediction, processor scalability, and a breakdown of overhead costs.
View the report (.pdf) View the presentation slides (.pdf) BibTeX entry (.bib)

Adding trace matching to AspectJ

(abc-2005-1)

back

Authors: Chris Allan, Pavel Avgustinov, Aske Simon Christensen, Laurie Hendren, Sascha Kuzins, Ondřej Lhoták, Oege de Moor, Damien Sereni, Ganesh Sittampalam, and Julian Tibble
Date: March 2005

Abstract
An aspect observes the execution of a base program; when certain actions occur, the aspect runs some extra code of its own. In the AspectJ language, the observations that an aspect can make are confined to the current action: it is not possible to directly observe the history of a computation.
Recently there have been several interesting proposals for new history-based language features, most notably by Douence et al. and also by Walker and Viggers. In this paper we present a new history-based language feature called tracematches, where the programmer can trigger the execution of extra code by specifying a regular pattern of events in a computation trace. We have fully designed and implemented tracematches as a seamless extension to AspectJ.
A key innovation in our tracematch approach is the introduction of free variables in the matching patterns. This enhancement enables a whole new class of applications where events can be matched not only by the event kind, but also by the values associated with the free variables. We provide several examples of applications enabled by this feature.
After introducing and motivating the idea of tracematches via examples, we present a detailed semantics of our language design, and we derive an implementation from that semantics. The implementation has been realised as an extension of the abc compiler for AspectJ.
View the report (.pdf) View the report (.ps)

Optimising AspectJ

(abc-2004-3)

back

Authors: Pavel Avgustinov, Aske Simon Christensen, Laurie Hendren, Sascha Kuzins, Jennifer Lhoták, Ondřej Lhoták, Oege de Moor, Damien Sereni, Ganesh Sittampalam, and Julian Tibble
Date: November 2004

Abstract
AspectJ, an aspect-oriented extension of Java, is becoming increasingly popular. However, not much work has been directed at optimising compilers for AspectJ. Optimising AOP languages provides many new and interesting challenges for compiler writers, and this paper identifies and addresses three such challenges.
First, compiling around advice efficiently is particularly challenging. We provide a new code generation strategy for around advice, which (unlike previous implementations) both avoids the use of excessive inlining and the use of closures. We show it leads to more compact code, and can also improve run-time performance. Second, woven code sometimes includes run-time tests to determine whether advice should execute. One important case is the cflow pointcut which uses information about the dynamic calling context. Previous techniques for cflow were very costly in terms of both time and space. We present new techniques to minimize or eliminate the overhead of cflow using both intra- and inter-procedural analyses. Third, we have addressed the general problem of how to structure an optimising compiler so that traditional analyses can be easily adapted to the AOP setting.
We have implemented all of the techniques in this paper in abc, our AspectBench Compiler for AspectJ, and we demonstrate significant speedups with empirical results. Some of our techniques have already been integrated into the production AspectJ compiler, ajc 1.2.1.

View the report (.pdf) View the report (.ps)

Code Layout as a Source of Noise in JVM Performance

(2004-8)

back

Authors: Dayong Gu, Clark Verbrugge, and Etienne Gagnon
Date: October 2004

Abstract
We describe the effect of a particular form of "noise" in benchmarking. We investigate the source of anomalous measurement data in a series of optimization strategies that attempt to improve data cache performance in the garbage collector of a Java virtual machine. The results of our experiments can be explained in terms of the difference in code positioning, and hence instruction and data cache behaviour. We show that unintended changes in code positioning due to code modifications as trivial as symbol renaming can contribute up to 2.7% of measured machine cycle cost, 20% in data cache misses, and 37% in instruction cache misses.
View the report (.pdf) BibTeX entry (.bib)

An Algorithmic Approach for Precise Analysis of the pi-Calculus

(2004-7)

back

Authors: Sam B. Sanjabi and Clark Verbrugge
Date: October 2004

Abstract
We present an algorithmic approach to dataflow analysis of the pi-calculus that unifies previous such analyses under a single framework. The structure of the algorithms presented lead to a dissociation of correctness proofs from the analysis itself, and this has led to the development of an algorithm that improves the accuracy of previous approaches by considering the operation of all process operators including sequencing and replication. Our analysis solution is also independent of any concurrent context in which the analyzed process runs.
View the report (.pdf) View the report (.ps) BibTeX entry (.bib)

Compiler Analyses for Improved Return Value Prediction

(2004-6)

back

Authors: Christopher J.F. Pickett and Clark Verbrugge
Date: October 2004

Abstract
Speculative method-level parallelism has been shown to benefit from return value prediction. In this paper we propose and analyse two compiler analyses designed to improve the cost and performance of a hybrid return value predictor in a Java virtual machine setting. A return value use analysis determines which return values are consumed, and enables us to eliminate 2.6% of all non-void dynamic method calls in the SPEC JVM98 benchmarks as prediction candidates. An extension of this analysis detects return values used only inside boolean and branch expressions, and allows us to safely substitute incorrect predictions for 14.1% of return values at runtime, provided all future use expressions are satisfied correctly. We find an average 3.2% reduction in memory and up to 7% increase in accuracy. A second, interprocedural parameter dependence analysis reduces the number of inputs to a memoization-based sub-predictor for 10.2% of all method calls, and the combination of both analyses reduces overall memory requirements by 5.3%.
View the report (.pdf) View the presentation slides (.pdf) BibTeX entry (.bib)

Building the abc AspectJ compiler with Polyglot and Soot

(abc-2004-2)

back

Authors: Pavel Avgustinov, Aske Simon Christensen, Laurie Hendren, Sascha Kuzins, Jennifer Lhoták, Ondřej Lhoták, Oege de Moor, Damien Sereni, Ganesh Sittampalam, and Julian Tibble
Date: October 2004

Abstract
Aspect-oriented programming and the development of aspect-oriented languages is rapidly gaining momentum, and the advent of this new kind of programming language provides interesting challenges for compiler developers. Aspect-oriented language features require new compilation approaches, both in the frontend semantic analysis and in the backend code generation. This paper is about the design and implementation of the abc compiler for the aspect-oriented language AspectJ.
One important contribution of this paper is to show how we can leverage existing compiler technology by combining Polyglot (an extensible compiler framework for Java) in the frontend and Soot (a framework for analysis and transformation of Java) in the backend. In particular, the extra semantic checks needed to compile AspectJ are implemented by extending and augmenting the compiler passes provided by Polyglot. All AspectJ-specific language constructs are then extracted from the program to leave a pure Java program that can be compiled using the existing Java code generation facility of Soot. Finally, all code generation and transformation is performed on Jimple, Soot's intermediate representation, which allows for a clean and convenient method of applying aspects to source code and class files alike.
A second important contribution of the paper is that we describe our implementation strategies for the new challenges that are specific to aspect-oriented language constructs. Although our compiler is targeted towards AspectJ, many of these ideas apply to aspect-oriented languages in general.
Our abc compiler implements the full AspectJ language as defined by ajc 1.2.0 and is freely available under the GNU LGPL.

View the report (.pdf) View the report (.ps)

Using inter-procedural side-effect information in JIT optimizations

(2004-5)

back

Authors: Anatole Le, Ondřej Lhoták, and Laurie Hendren
Date: October 2004

Abstract
Inter-procedural analyses such as side-effect analysis can provide information useful for performing aggressive optimizations. We present a study of whether side-effect information improves performance in just-in-time (JIT) compilers, and if so, what level of analysis precision is needed.
We used Spark, the inter-procedural analysis component of the Soot Java analysis and optimization framework, to compute side-effect information and encode it in class files. We modified Jikes RVM, a research JIT, to make use of side-effect analysis in local common sub-expression elimination, heap SSA, redundant load elimination and loop-invariant code motion. On the SpecJVM98 benchmarks, we measured the static number of memory operations removed, the dynamic counts of memory reads eliminated, and the execution time.
Our results show that the use of side-effect analysis increases the number of static opportunities for load elimination by up to 98%, and reduces dynamic field read instructions by up to 27%. Side-effect information enabled speedups in the range of 1.08x to 1.20x for some benchmarks. Finally, among the different levels of precision of side-effect information, a simple side-effect analysis is usually sufficient to obtain most of these speedups.
View the report (.pdf) View the report (.ps)

A Study of Type Analysis for Speculative Method Inlining in a JIT Environment

(2004-4)

back

Authors: Feng Qian and Laurie Hendren
Date: October 2004

Abstract
Method inlining is one of the most important optimizations for JIT compilers in Java virtual machines. In order to increase the number of inlining opportunities, a type analysis can be used to identify monomorphic virtual calls. In a JIT environment, the compiler and type analysis must also handle dynamic class loading properly because class loading can invalidate previous analysis results and invalidate some speculative inlining decisions. To date, a very simple type analysis, class hierarchy analysis (CHA), has been used successfully in JIT compilers for speculative inlining with invalidation techniques as backup.
This paper seeks to determine if more powerful dynamic type analyses could further improve inlining opportunities in a JIT compiler. To achieve this goal we developed a general dynamic type analysis framework which we have used for designing and implementing dynamic versions of several well-known static type analyses, including CHA, RTA, XTA and VTA.
Surprisingly, the simple dynamic CHA is nearly as good as an ideal type analysis for inlining virtual method calls. There is little room for further improvement. On the other hand, only a reachability-based interprocedural type analysis (VTA) is able to capture the majority of monomorphic interface calls.
We also found that memory overhead is the biggest issue with dynamic whole-program analyses. To address this problem, we outline the design of a demand-driven interprocedural type analysis for inlining hot interface calls.
View the report (.pdf) View the report (.ps)

abc: An extensible AspectJ compiler

(abc-2004-1)

back

Authors: Pavel Avgustinov, Aske Simon Christensen, Laurie Hendren, Sascha Kuzins, Jennifer Lhoták, Ondřej Lhoták, Oege de Moor, Damien Sereni, Ganesh Sittampalam, and Julian Tibble
Date: September 2004

Abstract
Research in the design of aspect-oriented programming languages requires a workbench that facilitates easy experimentation with new language features and implementation techniques. In particular, new features for AspectJ have been proposed that require extensions in many dimensions: syntax, type checking and code generation, as well as data flow and control flow analyses.
The AspectBench Compiler (abc) is an implementation of such a workbench. The base version of abc implements the full AspectJ language. Its frontend is built, using the Polyglot framework, as a modular extension of the Java language. The use of Polyglot gives flexibility of syntax and type checking. The backend is built using the Soot framework, to give modular code generation and analyses.
In this paper, we outline the design of abc, focusing mostly on how the design supports extensibility. We then provide a general overview of how to use abc to implement an extension. Finally, we illustrate the extension mechanisms of abc through a number of small, but non-trivial, examples. abc is freely available under the GNU LGPL.

View the report (.pdf) View the report (.ps)

A Practical MHP Information Analysis for Concurrent Java Programs

(2004-3)

back

Authors:
Lin Li and Clark Verbrugge
Date: June 2004

Abstract
In this paper we present an implementation of May Happen in Parallel analysis for Java that attempts to address some of the practical implementation concerns of the original work. We describe a design that incorporates techniques for aiding a feasible implementation and expanding the range of acceptable inputs. We provide experimental results showing the utility and impact of our approach and optimizations using a variety of concurrent benchmarks.
View the report (.pdf) Download the report (.ps.gz)

Measuring the Dynamic Behaviour of AspectJ Programs

(2004-2)

back

Authors:
Bruno Dufour, Christopher Goard, Laurie Hendren and Clark Verbrugge (McGill University)
Oege de Moor and Ganesh Sittampalam (Oxford University)
Date: March 2004

Abstract
This paper proposes and implements a rigorous method for studying the dynamic behaviour of AspectJ programs. As part of this methodology several new metrics specific to AspectJ programs are proposed and tools for collecting the relevant metrics are presented. The major tools consist of: (1) a modified version of the AspectJ compiler that tags bytecode instructions with an indication of the cause of their generation, such as a particular feature of AspectJ; and (2) a modified version of the *J dynamic metrics collection tool which is composed of a JVMPI-based trace generator and an analyzer which propagates tags and computes the proposed metrics. This dynamic propagation is essential, and thus this paper contributes not only new metrics, but also non-trivial ways of computing them.
We furthermore present a set of benchmarks that exercise a wide range of AspectJ's features, and the metrics that we measured on these benchmarks. The results provide guidance to AspectJ users on how to avoid efficiency pitfalls, to AspectJ implementors on promising areas for future optimization, and to tool builders on ways to understand runtime behaviour of AspectJ.
View the report (.pdf) Download the report (.ps.gz)

Precise, Partially Compositional Analysis of the pi-Calculus

(2004-1)

back

Authors:
Sam Sanjabi and Clark Verbrugge
Date: February 2004

Abstract
We present a new algorithm for computing control flow analysis on the pi-calculus. Our approach is strictly more accurate then existing algorithms, while maintaining a polynomial running time. Our algorithm is also partially compositional, in the sense that the representational structure that results from analyzing a process can be efficiently reused in an analysis of the same process in another context.
View the report (.pdf) Download the report (.ps.gz)

Jedd: A BDD-based relational extension of Java

(2003-7)

back

Authors: Ondřej Lhoták and Laurie Hendren
Date: November 2003

Abstract
In this paper we present Jedd, a language extension to Java that supports a convenient way of programming with Binary Decision Diagrams (BDDs). The Jedd language abstracts BDDs as database-style relations and operations on relations, and provides static type rules to ensure that relational operations are used correctly.
The paper provides a description of the Jedd language and reports on the design and implementation of the Jedd translator and associated runtime system. Of particular interest is the approach to assigning attributes from the high-level relations to physical domains in the underlying BDDs, which is done by expressing the constraints as a SAT problem and using a modern SAT solver to compute the solution. Further, a runtime system is defined that handles memory management issues and supports a browsable profiling tool for tuning the key BDD operations.
The motivation for designing Jedd was to support the development of whole program analyses based on BDDs, and we have used Jedd to express five key interrelated whole program analyses in our Soot compiler framework. We provide some examples of this application and discuss our experiences using Jedd.
View the report (.pdf) Download the report (.ps.gz)

Problems in Objectively Quantifying Benchmarks using Dynamic Metrics

(2003-6)

back

Authors: Bruno Dufour, Laurie Hendren and Clark Verbrugge
Date: October 2003

Abstract
Basic problems encountered when trying to accurately and reasonably measure dynamic properties of a program are discussed. These include problems in determining and assessing specific, desirable metric qualities that may be perturbed by subtle and unexpected program behaviour, as well as technical limitations on data collection. Some interpreter or Java-specific problems are examined, and the general issue of how to present metrics is also discussed.
View the report (.pdf) Download the report (.ps.gz)

Towards Dynamic Interprocedural Analysis in JVMs

(2003-5)

back

Authors: Feng Qian and Laurie Hendren
Date: October 2003

A revised version appears in VM 2004, see the final paper here.
Abstract
This paper presents a new, inexpensive, mechanism for constructing a complete call graph for Java programs at runtime, and provides an example of using the mechanism for implementing a dynamic reachability-based interprocedural analysis (IPA), namely dynamic XTA.
Reachability-based IPAs, such as points-to analysis, type analysis, and escape analysis, require a context-insensitive call graph of the analyzed program. Computing a call graph at runtime presents several challenges. First, the overhead must be low. Second, when implementing the mechanism for languages such as Java, both polymorphism and lazy class loading must be dealt with correctly and efficiently. We propose a new mechanism with low costs for constructing runtime call graphs in a JIT environment. The mechanism uses a profiling code stub to capture the fist-time execution of a call edge, and adds at most one more instruction to the repeated call edge invocations. Polymorphism and lazy class loading are handled transparently. The call graph is constructed incrementally, and it supports optimistic analysis and speculative optimizations with invalidations.
We also developed a dynamic, reachability-based type analysis, dynamic XTA, as an application of runtime call graphs. It also serves as an example of handling lazy class loading in dynamic IPAs.
The dynamic call graph construction algorithm and dynamic version of XTA have been implemented in Jikes RVM, and we present empirical measurements of the overhead of call graph construction and the characteristics of dynamic XTA.
View the report (.pdf) Download the report (.ps.gz)

Integrating the Soot compiler infrastructure into an IDE

(2003-4)

back

Authors: Jennifer Lhoták, Ondřej Lhoták and Laurie Hendren
Date: October 2003

Abstract
This paper presents the integration of Soot, a byte-code analysis and transformation framework, with an integrated development environment (IDE), Eclipse. Such an integrated toolkit is useful for both the compiler developer, to aid in understanding and debugging new analyses, and also for the end-user of the IDE, to aid in program understanding by exposing semantic information gathered by the advanced compiler analyses. The paper discusses these advantages and provides concrete examples of its usefulness.
There are several major challenges to overcome in developing the integrated toolkit, and the paper discusses three major challenges and the solutions to those challenges. An overview of Soot and the integrated toolkit is given, followed by a more detailed discussion of the fundamental components. The paper concludes with several illustrative examples of using the integrated toolkit along with a discussion of future plans and research.
View the report (.pdf). Download the report (.ps.gz)

Improving the Precision and Correctness of Exception Analysis in Soot

(2003-3)

back

Author: John Jorgensen
Date: September 2003

Abstract
To date, Soot has employed very conservative analyses of exceptional control flow, producing control flow graphs which include an edge to each exception handler from every statement within the block of code protected by that handler, including statements which cannot throw any exceptions caught by the handler. This report describes some of the difficulties and subtleties of analyzing exceptional control flow, and describes an attempt to refine Soot's analysis, by determining the types of exceptions each statement has the potential to throw, and drawing edges to exception handlers only from protected statements which may throw exceptions the handlers would catch.
View the report (.pdf). Download the report (.ps.gz)

Points-to Inspired Static Analysis For The pi-Calculus

(2003-2)

back

Author: Sam B. Sanjabi and Clark Verbrugge
Date: August 2003

Abstract
We present a flow and context insensitive control flow analysis of the pi-calculus which computes a superset of the names that can possibly be transmitted over each channel. The analysis is a simpler version of the constraint-based analysis of Bodei et al based on techniques inspired from points-to analyis, but requires no extension to the basic calculus. The algorithm computes the same solution as the previous one, but with better space efficiency. Our approach is closely related to techniques used in compiler optimizers to do points-to analyses, and is a simpler presentation, provides insight into the inner workings of the process, and immediately suggests other possible analyses on pi-calculus processes.
Download the report (.ps.gz)

Dynamic Metrics for Java

(2003-1)

back

Author: Bruno Dufour, Karel Driesen, Laurie Hendren and Clark Verbrugge
Date: March 2003

Abstract
In order to perform meaningful experiments in optimizing compilation and run-time system design, researchers usually rely on a suite of benchmark programs of interest to the optimization technique under consideration. Programs are described as numeric, memory-intensive, concurrent, or object-oriented, based on a qualitative appraisal, in some cases with little justification. We believe it is beneficial to quantify the behaviour of programs with a concise and precisely defined set of metrics, in order to make these intuitive notions of program behaviour more concrete and subject to experimental validation. We therefore define and measure a set of unambiguous, dynamic, robust and architecture-independent metrics that can be used to categorize programs according to their dynamic behaviour in five areas: size, data structure, memory use, concurrency, and polymorphism. A framework computing some of these metrics for Java programs is presented along with specific results demonstrating how to use metric data to understand a program's behaviour, and both guide and evaluate compiler optimizations.
View the report (.pdf) Download the report (.ps.gz) Link to Dynamic Metrics project

EVolve: An Open Extensible Software Visualization Framework

(2002-12)

back

Author: Qin Wang, Wei Wang, Rhodes Brown, Karel Driesen, Bruno Dufour, Laurie Hendren and Clark Verbrugge
Date: December 2002

Abstract
Existing visualization tools typically do not allow easy extension by new visualization techniques, and are often coupled with inflexible data input mechanisms. This paper presents EVolve, a flexible and extensible framework for visualizing program characteristics and behaviour. The framework is flexible in the sense that it can visualize many kinds of data, and it is extensible in the sense that it is quite straightforward to add new kinds of visualizations.
The overall architecture of the framework consists of the core EVolve platform that communicates with data sources via a well defined data protocol and which communicates with visualization methods via a visualization protocol.
Given a data source, an end-user can use EVolve as a stand-alone tool by interactively creating, configuring and modifying visualizations. A variety of visualizations are provided in the current EVolve library, with features that facilitate the comparison of multiple views on the same execution data. We demonstrate EVolve in the context of visualizing execution behaviour of Java programs.
View the report (.pdf) Link to EVolve project

Dynamic Metrics for Compiler Developers

(2002-11)

back

Author: Bruno Dufour, Karel Driesen, Laurie Hendren and Clark Verbrugge
Date: November 2002

Abstract
In order to perform meaningful experiments in optimizing compilation and run-time system design, researchers usually rely on a suite of benchmark programs of interest to the optimization technique under consideration. Programs are described as numeric, memory-intensive, concurrent, or object-oriented, based on a qualitative appraisal, in some cases with little justification. We believe it is beneficial to quantify the behavior of programs with a concise and precisely defined set of metrics, in order to make these intuitive notions of program behavior more concrete and subject to experimental validation. We therefore define a set of unambiguous, dynamic, robust and architecture-independent metrics that can be used to categorize programs according to their dynamic behavior in five areas: size, data structure, memory use, concurrency, and polymorphism. A framework computing some of these metrics for Java programs is presented along with specific results.
View the report (.pdf) Download the report (.ps.gz) Link to Dynamic Metrics project

Points-to Analysis using BDDs

(2002-10)

back

Author: Marc Berndl, Ondřej Lhoták, Feng Qian, Laurie Hendren and Navindra Umanee
Date: November 2002
A revised version to appear in PLDI'03, San Diego
Abstract
This paper reports on a new approach to solving a subset-based points-to analysis for Java using Binary Decision Diagrams (BDDs). In the model checking community, BDDs have been shown very effective for representing large sets and solving very large verification problems. Our work shows that BDDs can also be very effective for developing a points-to analysis that is simple to implement and that scales well, in both space and time, to large programs.
The paper first introduces BDDs and operations on BDDs using some simple points-to examples. Then, a complete subset-based points-to algorithm is presented, expressed completely using BDDs and BDD operations. This algorithm is then refined by finding appropriate variable orderings and by making the algorithm incremental, in order to arrive at a very efficient algorithm. Experimental results are given to justify the choice of variable ordering, to demonstrate the improvement due to incrementalization, and to compare the performance of the BDD-based solver to an efficient hand-coded graph-based solver. Finally, based on the results of the BDD-based solver, a variety of BDD-based queries are presented, including the points-to query.
View the report (.pdf) Download the report (.ps.gz) Link to PTA-BDD project

Scaling Java Points-To Analysis using Spark

(2002-9)

back

Author: Ondřej Lhoták and Laurie Hendren
Date: October 2002

Abstract
Most points-to analysis research has been done on different systems by different groups, making it difficult to compare results, and to understand interactions between individual factors each group studied. Furthermore, points-to analysis for Java has been studied much less thoroughly than for C, and the tradeoffs appear very different.
We introduce Spark, a flexible framework for experimenting with points-to analyses for Java. Spark supports equality- and subset-based analyses, variations in field sensitivity, respect for declared types, variations in call graph construction, off-line simplification, and several solving algorithms. Spark is composed of building blocks on which new analyses can be based.
We demonstrate Spark in a substantial study of factors affecting precision and efficiency of subset-based points-to analyses, including interactions between these factors. Our results show that Spark is not only flexible and modular, but also offers superior time/space performance when compared to other points-to analysis implementations.

View the report (.ps) Download the report (.ps.gz)

Dynamic Profiling and Trace Cache Generation for a Java Virtual Machine

(2002-8)

back

Author: Marc Berndl and Laurie Hendren
Date: October 2002

Abstract
Dynamic program optimization is becoming increasingly important for achieving good runtime performance. One of the key issues in such systems is how it selects which code to optimize. One approach is to dynamically detect traces, long sequences of instructions which are likely to execute to completion. Such traces can be stored in a trace cache and dispatched one trace at a time (rather than one instruction or one basic block at a time). Furthermore, traces have been shown to be a good unit for optimization.
This paper reports on a new approach for dynamically detecting, creating and storing traces in a Java virtual machine. We first elucidate four important design criteria for a successful trace strategy: good instruction stream coverage, low dispatch rate, cache stability and optimizability of traces. We then present our approach which is based on branch correlation graphs. Branch correlation graphs store information about the correlation between pairs of branches, as well as additional state information.
We present the complete design for an efficient implementation of the system, including a detailed discussion of the trace cache and profiling mechanisms. We have implemented an experimental framework to measure the traces generated by our approach in a direct-threaded-inlining Java VM (SableVM) and we present experimental results to show that the traces we generate meet the design criteria.

View the report (.pdf) View the report (.ps) Download the report (.ps.gz)

STEP: A Framework for the Efficient Encoding of General Trace Data

(2002-7)

back

Author: Rhodes Brown, Karel Driesen, David Eng, Laurie Hendren, John Jorgensen, Clark Verbrugge and Qin Wang
Date: June 2002

Abstract
Traditional tracing systems are often limited to recording a fixed set of basic program events. These limitations can frustrate an application or compiler developer who is trying to understand and characterize the complex behavior of software systems such as a Java program running on a Java Virtual Machine. In the past, many developers have resorted to specialized tracing systems that target a particular type of program event. This approach often results in an obscure and poorly documented encoding format which can limit the reuse and sharing of potentially valuable information. To address this problem, we present STEP, a system designed to provide profiler developers with a standard method for encoding general program trace data in a flexible and compact format. The system consists of a trace data definition language along with a compiler and architecture that simplifies the client interface by encapsulating the details of encoding and interpretation.

View the report (.pdf) View the report (.ps) Download the report (.ps.gz)

EVolve: An Extensible Software Visualization Framework

(2002-6)

back

Author: Qin Wang, Rhodes Brown, Karel Driesen, Laurie Hendren and Clark Verbrugge
Date: July 2002

Abstract
Existing visualization tools typically do not allow easy extension by new visualization techniques, and are often coupled with inflexible data input mechanisms. This paper presents EVolve, a flexible and extensible framework for visualizing program characteristics and behaviour. The framework is flexible in the sense that it can visualize many kinds of data, and it is extensible in the sense that it is quite straightforward to add new kinds of visualizations.
The overall architecture of the framework consists of the core EVolve platform that communicates with data sources via a well defined data protocol and which communicates with visualization methods via a visualization protocol.
Given a data source, an end-user can use EVolve as a stand-alone tool by interactively creating, configuring and modifying one or more visualizations. A variety of visualizations are provided in the standard EVolve visualization library. EVolve can be used to build custom visualizers by implementing new data sources and/or new kinds of visualizations.
The paper presents an overview of the system, examples of its use, and discusses our experiences using the tool both as an end-user and as a developer building custom visualizers.

View the report (.ps) Download the report (.ps.gz)

Fast Local List Scheduling

(2002-5)

back

Clark Verbrugge
Date: July 2002

Abstract
List scheduling is a well-known O(n^2) algorithm for performing instruction scheduling during compiler code generation on modern, superscalar architecture. An asymptotically more efficient O(n*log(n)) variation is presented; this algorithm achieves its lower time bounds by asserting practical constraints on the structure of instruction dependencies.
View the report (.ps) Download the report (.ps.gz)

Combining Static and Dynamic Data in Code Visualization

(2002-4)

back

Author: David Eng
Date: June 2002

Abstract
The task of developing, tuning, and debugging compiler optimizations is a difficult one which can be facilitated by software visualization. There are many characteristics of the code which must be considered when studying the kinds of optimizations which can be performed. Both static data collected at compile-time and dynamic runtime data can reveal opportunities for optimization and affect code transformations. Visualization of such complex systems must include as much information as possible, and accommodate the different sources from which this information is acquired.
This paper presents a visualization framework designed to address these issues. The framework is based on a new, extensible language called JIL which provides a common format for encapsulating intermediate representations and associating them with compile-time and runtime data. Custom visualization interfaces can then combine JIL data from separate tools, exposing both static and dynamic characteristics of the underlying code.

View the report (.ps) Download the report (.ps.gz)

JIL: an Extensible Intermediate Language

(2002-3)

back

Author: David Eng
Date: June 2002

Abstract
The Java Intermediate Language (JIL) is a subset of XML and SGML described in this document. Its goal is to provide an intermediate representation of Java source code suitable for machine use. JIL benefits from the features of XML, such as extensibility and portability, while providing a common ground for software tools. The following document discusses the design issues and overall framework for such a language, including a description of all fundamental elements.

View the report (.ps) Download the report (.ps.gz)

Run-time Evaluation of Opportunities for Object Inlining in Java

(2002-2)

back

Authors: Ondřej Lhoták and Laurie Hendren
Date: June 2002

Abstract
Object-oriented languages, like Java, encourage the use of many small objects linked together by field references, instead of a few monolithic structures. While this practice is beneficial from a program design perspective, it can slow down program execution by incurring many pointer indirections. One solution to this problem is object inlining: when the compiler can safely do so, it fuses small objects together, thus removing the reads/writes to the removed field, saving the memory needed to store the field and object header, and reducing the number of object allocations.
The objective of this paper is to measure the potential for object inlining by studying the run-time behavior of a comprehensive set of Java programs. We study the traces of program executions in order to determine which fields behave like inlinable fields. Since we are using dynamic information instead of a static analysis, our results give an upper bound on what could be achieved via a static compiler-based approach. Our experimental results measure the potential improvements attainable with object inlining, including reductions in the numbers of field reads and writes, and reduced memory usage.
Our study shows that some Java programs can benefit significantly from object inlining, with close to a 10% speedup. Somewhat to our surprise, our study found one case, the db benchmark, where the most important inlinable field was the result of unusual program design, and fixing this small flaw led to both better performance and clearer program design. However, the opportunities for object inlining are highly dependent on the individual program being considered, and are in many cases very limited. Furthermore, fields that are inlinable also have properties that make them potential candidates for other optimizations such as removing redundant memory accesses. The memory savings possible through object inlining are moderate.

View the report (.ps) Download the report (.ps.gz)

An Adaptive, Region-Base Allocator for Java

(2002-1)

back

Authors: Feng Qian and Laurie Hendren
Date: February 2002

Abstract
This paper introduces an adaptive, region-based allocator for Java. The basic idea is to allocate non-escaping objects in local regions, which are allocated and freed in conjunction with their associated stack frames. By releasing memory associated with these stack frames, the burden on the garbage collector is reduced, possibly resulting in fewer collections.
The novelty of our approach is that it does not require static escape analysis, programmer annotations, or special type systems. The approach is transparent to the Java programmer and relatively simple to add to an existing JVM. The system starts by assuming that all allocated objects are local to their stack region, and then catches escaping objects via write barriers. When an object is caught escaping, its associated allocation site is marked as a non-local site, so that subsequent allocations will be put directly in the global region. Thus, as execution proceeds, only those allocation sites that are likely to produce non-escaping objects are allocated to their local stack region.
The paper presents the overall idea, and then provides details of a specific design and implementation. In particular, we present a paged region allocator and the necessary modifications of the Jikes RVM baseline JIT and a copying collector. Our experimental study evaluates the idea using the SPEC JVM98 benchmarks, plus one other large benchmark. We show that a region-based allocator is a reasonable choice, that overheads can be kept low, and that the adaptive system is successful at finding local regions that contain no escaping objects.
Download the report (.ps.gz)

STOOP: The Sable Toolkit for Object-Oriented Profiling

(2001-2)

back

Authors: Rhodes Brown, Karel Driesen, David Eng, Laurie Hendren, John Jorgensen, Clark Verbrugge, and Qin Wang
Date: November 2001

Abstract
The performance and behaviour of large object-oriented software systems can be difficult to profile. Furthermore, existing profiling systems are often limited by a fixed set of events that can be profiled and a fixed set of visualizations. This paper presents the Sable Toolkit for Object-Oriented Profiling (STOOP), which we developed to support a wide variety of profiling needs. The toolkit makes it simple to build new profilers which can be used for both standard and less conventional profiling tasks.
At the core of the toolkit is its event pipe. A profiling agent produces events for the event pipe and a visualizer consumes events from the pipe. To provide a general purpose event pipe we have defined a language for defining events and a compiler which automatically produces the event pipe interfaces for the specified events. Many different profiling agents can produce events for the event pipe, including JVMPI agents, instrumented VMs and instrumented bytecode. It is also simple to adapt a variety of visualizers to read events from the event pipe. We have worked with two visualizers, EVolve and JIMPLEX. We summarize each visualizer and show how easy it was to adapt them to use events produced from STOOP.
Download the report (.ps.gz)

Decompiling Java Bytecode: Problems, Traps and Pitfalls

(2001-1)

back

Authors: Jerome Miecznikowski and Laurie Hendren
Date: October 2001

Abstract
Java virtual machines execute Java bytecode instructions. Since this bytecode is a higher level representation than traditional object code, it is possible to decompile it back to Java source. Many such decompilers have been developed and the conventional wisdom is that decompiling Java bytecode is relatively simple. This may be true when decompiling bytecode produced directly from a specific compiler, most often Sun's javac compiler. In this case it is really a matter of inverting a known compilation strategy. However, there are many problems, traps and pitfalls when decompiling arbitrary verifiable Java bytecode. Such bytecode could be produced by other Java compilers, Java bytecode optimizers or Java bytecode obfuscators. Java bytecode can also be produced by compilers for other languages, including Haskell, Eiffel, ML, Ada and Fortran. These compilers often use very different code generation strategies from javac.
This paper outlines the problems and solutions we have found in our development of Dava, a decompiler for arbitrary Java bytecode. We first outline the problems in assigning types to variables and literals, and the problems due to expression evaluation on the Java stack. Then, we look at finding structured control flow with a particular emphasis on how to deal with Java exceptions and synchronized blocks. Throughout the paper we provide small examples which are not properly decompiled by commonly used decompilers.
Download the report (.ps.gz) (the same version as paper 2002-2)

A Comprehensive Approach to Array Bounds Check Elimination for Java

(2000-4)

back

Authors: Feng Qian, Laurie Hendren and Clark Verbrugge
Date: November 2000

Abstract
This paper reports on a new approach to eliminating array bounds checks in Java. Our approach is based upon a flow-sensitive intraprocedural analysis called variable constraint analysis (VCA). This analysis builds a small constraint graph for each important point in a method, and then uses the information encoded in the graph to infer the relationship between array index expressions and the bounds of the array. Using VCA as the base analysis, we also show how two further analyses can improve the results of VCA. Array field analysis is applied on each class and provides information about some arrays stored in fields, while rectangular array analysis is an interprocedural analysis to approximate the shape of arrays, and is useful for finding rectangular (non-ragged) arrays.
We have implemented all three analyses using the Soot bytecode optimization/annotation framework and we transmit the results of the analysis to virtual machines using class file attributes. We have modified the Kaffe JIT, and IBM's High Performance Compiler for Java (HPCJ) to make use of these attributes, and we demonstrate significant speedups.
View the report (.ps) Download the report (.ps.gz)

SableVM: A Research Framework for the Efficient Execution of Java Bytecode

(2000-3)

back

Authors: Etienne Gagnon and Laurie Hendren
Date: November 2000

Abstract
SableVM is an open-source virtual machine for Java, intended as a research framework for efficient execution of Java bytecode. The framework is essentially composed of an extensible bytecode interpreter using state-of-the-art and innovative techniques. Written in the C programming language, and assuming minimal system dependencies, the interpreter emphasizes high-level techniques to support efficient execution. In particular, we introduce new data layouts for classes, virtual tables and object instances that reduce the cost of interface method calls to that of normal virtual calls, allow efficient garbage collection and light synchronization, and make effective use of memory space.
View the report (.ps) Download the report (.ps.gz)

A Framework for Optimizing Java Using Attributes

(2000-2)

back

Authors: Patrice Pominville, Feng Qian, Raja Vallée-Rai, Laurie Hendren and Clark Verbrugge
Date: July 2000 (updated: September 2000)

Abstract
This paper presents a framework for supporting the optimization of Java programs using attributes in Java class files. We show how class file attributes may be used to convey both optimization opportunities and profile information to a variety of Java virtual machines including ahead-of-time compilers and just-in-time compilers.
We present our work in the context of Soot, a framework that supports the analysis and transformation of Java bytecode (class files). We demonstrate the framework with attributes for elimination of array bounds and null pointer checks, and we provide experimental results for the Kaffe just-in-time compiler, and IBM's High Performance Compiler for Java ahead-of-time compiler.
View the report (.ps) Download the report (.ps.gz)

Common Subexpression Removal in Soot

(2000-1)

back

Author: Patrick Lam
Date: May 2000

Abstract
Everyone who knows something about optimizing compilers knows about common subexpression elimination. It is used to remove redundant computations, which usually improves the execution time of a program. The transformation is quite simple to describe, and so common subexpression elimination must, of course, therefore be trivial to implement. In this work, we discuss an implementation of common subexpression elimination for Java, using the Soot framework. Our implementation passes the Soot sentinel suite. Experimental results are presented, showing that common subexpression elimination slightly speeds up Java programs.
View the report (.ps) Download the report (.ps.gz)

Practical Virtual Method Call Resolution for Java

(1999-4)

back

Authors: Vijay Sundaresan, Laurie Hendren, Chrislain Razafimahefa, Raja Vallée-Rai, Patrick Lam, Etienne Gagnon and Charles Godin
Date: November 99

Abstract
This paper addresses the problem of resolving virtual method and interface calls in Java bytecode. The main focus is on a new practical technique that can be used to analyze large applications while at the same time providing more accurate results than class hierarchy analysis and rapid type analysis. We present two variations of our new technique, variable-type analysis and a coarser-grain version called declared-type analysis. Both of these analyses are inexpensive, easy to implement, and our experimental results show that they scale linearly in the size of the program.
We have implemented our new analyses using the Soot framework, and we report on empirical results for seven benchmarks. We have used our techniques to build accurate call graphs for complete applications (including libraries) and we show that compared to a conservative call graph built using class hierarchy analysis, our new variable-type analysis can remove a significant number of nodes (methods) and call edges. Further, our results show that we can improve upon the compression obtained using rapid type analysis.
We also provide dynamic measurements of monomorphic call sites, focusing on the benchmark code excluding libraries. We demonstrate that when considering only the benchmark code, both rapid type analysis and our new declared-type analysis do not add much precision over class hierarchy analysis. However, our finer-grained variable-type analysis does resolve significantly more call sites, particularly for very object-oriented programs.
View the report (.ps) Download the report (.ps.gz)

Optimizing Java Bytecode using the Soot Framework: Is it Feasible?

(1999-3)

back

Authors: Raja Vallée-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan,
Date: November 99

Abstract
This paper presents Soot, a framework for optimizing Java(tm) bytecode. The framework is implemented in Java and supports three intermediate representations for representing Java bytecode: Baf, a streamlined representation of Java's stack-based bytecode; Jimple, a typed three-address intermediate representation suitable for optimization; and Grimp, an aggregated version of Jimple. Our approach to class file optimization is to first convert the stack-based bytecode into Jimple, a three-address form more amenable to traditional program optimization, and then convert the optimized Jimple back to bytecode.
In order to demonstrate that our approach is feasible, we present experimental results showing the effects of processing class files through our framework. In particular, we study the techniques necessary to effectively translate Jimple back to bytecode, without losing performance. Finally, we demonstrate that class file optimization can be quite effective by showing the results of some basic optimizations using our framework. Our experiments were done on ten benchmarks, including seven SPECjvm98 benchmarks, and were executed on five different Java virtual machine implementations.
View the report (.ps) Download the report (.ps.gz)

Intra-procedural Inference of Static Types for Java Bytecode

(1999-1)

back

Authors: Etienne Gagnon and Laurie J. Hendren
Date: March 1999

Abstract
Although Java bytecode has a significant amount of type information embedded in it, there are no explicit types for local variables. However, knowing types for local variables is very useful for both program optimization and decompilation. In this paper, we present a practical algorithm for inferring static types for local variables in a 3-address, stackless, representation of Java bytecode.
By decoupling the type inference problem from the low level bytecode representation, and abstracting it into a constraint system, we are able to construct a sound and efficient algorithm for inferring the type of most bytecode found in practice. Further, for bytecode that cannot be typed, we suggest transformations that make it typeable.
Using this same constraint system, we show that this type inference problem is NP-hard in general. However, this hard case does not seem to arise in practice and our experimental results show that all of the 17,000 methods used in our tests were successfully and efficiently typed by our algorithm.
View the report (.ps) Download the report (.ps.gz)

Of Graphs and Coffi Grounds: Decompiling Java

(1998-6)

back

Author: Patrick Lam
Date: September 1998

Abstract
Java programmers write applications and applets in plain English-like text, and then apply a java compiler to the text to obtain {\em class files}. Class files, which are typically transmitted across the web, are a low-level representation of the original text; they are not human-readable. Consider a compiler as a function from text to class files. My goal is to compute the inverse function: given the compiled class file, I wish to find the best approximation to the original text possible. This is called decompilation.
Given a class file, one can build an unstructured graph representation of the program. The main goal of my work is to develop graph algorithms to convert these unstructured graphs into structured graphs corresponding to high-level program constructs, and I will discuss these in detail. I shall also mention some results concerning possibility and impossibility which show that decompilation is always possible if the program may be lengthened.
View the report (.ps) Download the report (.ps.gz)

Jimple: Simplifying Java Bytecode for Analyses and Transformations

(1998-4)

back

Authors: Raja Vallee-Rai and Laurie J. Hendren
Date: July 1998

Abstract
In this paper we present Jimple, a 3-address intermediate representation that has been designed to simplify analysis and transformation of Java bytecode. We motivate the need for a new intermediate representation by illustrating several difficulties with optimizing the stack-based Java bytecode directly. In general, these difficulties are due to the fact that bytecode instructions affect an expression stack, and thus have implicit uses and definitions of stack locations. We propose Jimple as an alternative representation, in which each statement refers explicitly to the variables it uses. We provide both the definition of Jimple and a complete procedure for translating from Java bytecode to Jimple. This definition and translation have been implemented using Java, and finally we show how this implementation forms the heart of the Sable research projects.
View the report (.ps) Download the report (.ps.gz)

Measuring ILP in Java Programs

(1998-3)

back

Author: Raja Vallee-Rai
Date: June 1998

Abstract
Java programs can greatly differ in terms of required runtime support. C-like Java programs which are usually computation intensive tend to require little, whereas heavily object oriented programs such as compilers, can spend most of their execution time in runtime handlers. This report begins to explore the differences in instruction level parallelism between these two extreme types of Java programs. No solid conclusions can be drawn, however, due to an oversimplified execution model and the scarcity of appropriate benchmarks.
View the report (.ps) Download the report (.ps.gz)

Profiling the Kaffe JIT Compiler

(1998-2)

back

Author: Raja Vallee-Rai
Date: February 1998

Abstract
This report documents our attempts at instrumenting the Kaffe JIT compiler in order to estimate the execution cost of each bytecode instruction in the Java virtual machine.
View the report (.ps) Download the report (.ps.gz)

The Jimple Framework

(1998-1)

back

Author: Raja Vallee-Rai
Date: February 1998

Note: This report is obsolete. It covers an extremely old version of the Jimple framework which is now called Soot.

Abstract
The Java Virtual Machine is a stack based architecture. This makes compiler analyses and transformations more complex because of the implicit participation of the operand stack in instructions. To simplify these tasks, we developed an alternative representation for Java bytecode based on 3-address code and a framework to manipulate Java classfiles in this form. This report presents an overview of the Jimple framework and the algorithms used in the jimplification process.
View the report (.ps) Download the report (.ps.gz)

Last updated Mon Aug 25 09:04:54 EDT 2003.