Sable Publications (Theses)

Efficient Design and Implementation of Software Thread-Level Speculation

back
Author: Zhen Cao
Date: December 2015

Software approaches to Thread-Level Speculation (TLS) have been recently explored, bypassing the need for specialized hardware designs. These approaches, however, tend to focus on source or VM-level implementations aimed at specific language and runtime environments. In addition, previous software approaches tend to make use of a simple thread forking model, reducing their ability to extract substantial parallelism from tree-form recursion programs. We propose a Mixed forking model Universal software-TLS (MUTLS) system to overcome these limitations. Our work demonstrates that actual speedup is achievable on existing, commodity multi-core processors while maintaining the flexibility of a highly generic implementation context, though memory-intensive benchmarks could be improved by further optimizations. Our experiments also indicate that a mixed model is preferable for parallelization of tree-form recursion applications over the simple forking models used by previous software-TLS approaches.

We then improve the performance of the MUTLS system by proposing memory buffering optimizations. Traditional "lazy" buffering mechanisms enable strong isolation of speculative threads, but imply large memory overheads, while more recent "eager" mechanisms improve scalability, but are more sensitive to data dependencies and have higher rollback costs. We thus describe an integrated system that incorpo- rates the best of both designs, automatically selecting the best buffering mechanism. Our approach builds on novel buffering designs with specific optimizations that improve both lazy and eager buffer management, allowing us to achieve 71% geometric mean performance of the OpenMP manually parallelized version.

Fork-heuristics play a key role in automatic parallelization using TLS. Current fork-heuristics either lack real parallel execution environment information to accurately evaluate fork points and/or focus on hardware-TLS implementation which cannot be directly applied to software-TLS. We propose adaptive fork-heuristics as well as a feedback-based selection technique to overcome the problems. Experiments show that the adaptive fork-heuristics and feedback-based selection are generally effective for automatic parallelization using software-TLS, achieving respectively 56% and 76% performance of the manually annotated speculative version. More advanced automatic workload distribution strategies can also help to improve the effectiveness of the automatic parallelization approaches.

Finally, dynamic languages such as Python are difficult to statically analyze and parallelize, which is an ideal context for the dynamic parallelization software- TLS approach. We integrate the Python JIT specializing compiler Numba with the MUTLS system to get a parallelizing compiler for Python. Experiments on 6 benchmarks show that the speculatively parallelized version achieve speedups from 1.8 to 16.4 and from 2.2 to 40.5 for the JIT compiled python programs with and without accounting for JIT compilation time, respectively. Overall, we demonstrate that software-TLS can be an efficient and effective approach for automatic parallelization on commodity multi-core processors for a variety of language contexts/execution environments.



View the thesis (.pdf) BibTeX entry

VeloCty: A Static Optimising Compiler for MATLAB and NumPy

back
Author: Sameer Jagdale
Date: March 2015

Abstract

High-level scientific languages such as MATLAB and Python's NumPy library are gaining popularity among scientists and mathematicians. These languages provide many features such as dynamic typing, high-level scientific functions etc. which allow easy prototyping. However these features also inhibit performance of the code.

We present VeloCty, an optimizing static compiler for MATLAB and Python as a solution to the problem of enhancing performance of programs written in these languages. In most programs, a large portion of the time is spent executing a small part of the code. Moreover, these sections can often be compiled ahead of time and improved performance can be achieved by optimizing only these `hot' sections of the code. VeloCty takes as input functions written in MATLAB and Python specified by the user and generates an equivalent C++ version. VeloCty also generates glue code to interface with MATLAB and Python. The generated code can then be compiled and packaged as a shared library that can be linked to any program written in MATLAB and Python. We also implemented optimisations to eliminate array bounds checks, reuse previously allocated memory during array operations and support parallel execution using OpenMP.

VeloCty uses the Velociraptor toolkit. We implemented a C++ backend for the Velociraptor intermediate representation, VRIR, and language-specific runtimes for MATLAB and Python. We have also implemented a MATLAB VRIR generator using the \mclab toolkit.

VeloCty was evaluated using 17 MATLAB benchmarks and 9 Python benchmarks. The MATLAB benchmark versions compiled using VeloCty with all optimisations enabled were between 1.3 to 458 times faster than the MathWorks' MATLAB2014b interpreter and JIT compiler. Similarly, Python benchmark versions were between 44.11 and 1681 times faster than the CPython interpreter.


View the thesis (PDF) BibTex entry

MC2FOR: A MATLAB TO FORTRAN 95 COMPILER

back
Author: Xu Li
Date: April 2014

Abstract

MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB's high-level syntax and dynamic types make it ideal for fast prototyping, programmers often prefer using high-performance static languages such as FORTRAN for their final distribution. Rather than rewriting the code by hand, our solution is to provide a source-to-source compiler that translates the original MATLAB program to an equivalent MATLAB program.

In this thesis, we introduce MC2FOR, a source-to-source compiler which transforms MATLAB to FORTRAN and handles several important challenges during the transformation, such as efficiently estimating the static type characteristics of all the variables in a given MATLAB program, mapping numerous MATLAB built-in functions to FORTRAN, and correctly supporting some MATLAB dynamic features in the generated FORTRAN code.

This compiler consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shapes of the arrays and the ranges of the scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated FORTRAN program. The second part is an extensible FORTRAN code generation framework automatically transforming MATLAB constructs to equivalent FORTRAN constructs.

This work has been implemented within the McLab framework, and we evaluated the performance of the \mctfor compiler on a collection of 20 MATLAB benchmarks. For most of the benchmarks, the generated FORTRAN program runs 1.2 to 337 times faster than the original MATLAB program, and in terms of physical lines of code, typically grows only by a factor of around 2. These experimental results show that the code generated by \mctfor performs better on average, at the cost of only a modest increase in code size.


View the thesis (PDF) BibTex entry

MiX10: Compiling MATLAB to X10 for High Performance

back
Author: Vineet Kumar
Date: April 2014

Abstract

MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are better suited for emerging high performance computing systems. This thesis reports on MIX10, a source-to-source compiler that automatically translates MATLAB programs to X10, a language designed for “Performance and Productivity at Scale"; thus, helping scientific programmers make better use of high performance computing systems. There is a large semantic gap between the array-based dynamically-typed nature of MATLAB and the object-oriented, statically-typed, and high-level array abstractions of X10. This thesis addresses the major challenges that must be overcome to produce sequential X10 code that is competitive with state-of-the-art static compilers for MATLAB which target more conventional imperative languages such as C and Fortran. Given that efficient basis, the thesis then provides a translation for the MATLAB parfor construct that leverages the powerful concurrency constructs in X10. The MIX10 compiler has been implemented using the McLab compiler tools, is open source, and is available both for compiler researchers and end-user MATLAB programmers. We have used the implementation to perform many empirical measurements on a set of 17 MATLAB benchmarks. We show that our best MIX10-generated code is significantly faster than the de facto Mathworks’ MATLAB system, and that our results are competitive with state-of-the-art static compilers that target C and Fortran. We also show the importance of finding the correct approach to representing the arrays in X10, and the necessity of an IntegerOkay analysis that determines which double variables can be safely represented as integers. Finally, we show that our X10-based handling of the MATLAB parfor greatly outperforms the de facto MATLAB implementation.


View the thesis (PDF) BibTex entry

DYNAMIC COMPILER OPTIMIZATION TECHNIQUES FOR MATLAB

back
Author: Nurudeen Abiodum Lameed
Date: April 2013

MATLAB has gained widespread acceptance among engineers and scientists. Several aspects of the language such as dynamic loading and typing, safe updates, copy semantics for arrays, and support for higher-order functions contribute to its appeal, but at the same time provide many challenges to the compiler and virtual machine. MATLAB is a dynamic language. Traditional implementations of the language use interpreters and have been found to be too slow for large computations. More recently, researchers and software developers have been developing JIT compilers for MATLAB and other dynamic languages. This thesis is about the development of new compiler analyses and transformations for a MATLAB JIT compiler, McJIT, which is based on the LLVM JIT compiler toolkit.

The new contributions include a collection of novel analyses for optimizing copying of arrays, which are performed when a function is first compiled. We designed and imple- mented four analyses to support an efficient implementation of array copy semantics in a MATLAB JIT compiler. Experimental results show that copy optimization is essential for performance improvement in a compiler for the MATLAB language.

We also developed a variety of new dynamic analyses and code transformations for optimizing running code on-the-fly according to the current conditions of the runtime en- vironment. LLVM does not currently support on-the-fly code transformation. So, we first developed a new on-stack replacement approach for LLVM. This capability allows the run- time stack to be modified during the execution of a function, thus enabling a continuation of the execution at a higher optimization level. We then used the on-stack replacement implementation to support selective inlining of function calls in long-running loops. Our experimental results show that function calls in long-running loops can result in high run- time overhead, and that selective dynamic inlining can be used to drastically reduce this overhead.

The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Pro- grammers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this thesis, we show that although feval provides an acceptable abstraction mechanism for these types of applications, there are significant performance overheads for function calls via feval, in both MATLAB inter- preters and JITs. The thesis then proposes, implements and compares two on-the-fly mech- anisms for specialization of feval calls. The first approach uses our on-stack replacement technology. The second approach specializes calls of functions with feval using a combi- nation of runtime input argument types and values. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.

The implementation of all the analyses and code transformations presented in this thesis has been done within the McLab virtual machine, McVM, and is available to the public as open source software.



View the thesis (.pdf) BibTeX entry

UNDERSTANDING AND REFACTORING THE MATLAB LANGUAGE

back
Author: Soroush Radpour
Date: August 2012

Abstract

MATLAB is a very popular dynamic "scripting" language for numerical computations used by scientists, engineers and students world-wide. MATLAB programs are often developed incrementally using a mixture of MATLAB scripts and functions and frequently build upon existing code which may use outdated features. This results in programs that could benefit from refactoring, especially if the code will be reused and/or distributed. Despite the need for refactoring there appear to be no MATLAB refactoring tools available. Correct refactoring of MATLAB is quite challenging because of its non-standard rules for binding identifiers. Even simple refactorings are non-trivial. Compiler writers and software engineers are generally not familiar with MATLAB and how it is used so the problem has been left untouched so far.

This thesis has two main contributions. The first is McBench, a tool that helps compiler writers understand the language better. In order to have a systematic approach to the problem, we developed this tool to give us some insight about how programmers use MATLAB.
The second contribution is a suite of semantic-preserving refactoring for MATLAB functions and scripts including: function and script inlining, converting scripts to functions, extracting new functions, and converting dynamic feval calls to static function calls. These refactorings have been implemented using the McLAB compiler framework, and an evaluation is given on a large set of MATLAB programs which demonstrates the effectiveness of our approach.


View the thesis (PDF) BibTex entry

Software Method Level Speculation for Java

back
Author: Christopher J. F. Pickett
Date: April 2012

Abstract

Speculative multithreading (SpMT), also known as thread level speculation (TLS), is a dynamic parallelization technique that relies on out-of-order execution, dependence buffering, and misspeculation rollback to achieve speedup of sequential programs on multiprocessors. A large number of hardware studies have shown good results for irregular programs, as have a smaller number of software studies in the context of loop level speculation for unmanaged languages.

In this thesis we explore software method level speculation for Java. A software environment means that speculation will run on existing multiprocessors, at the cost of extra overhead. Method level speculation (MLS) is a kind of SpMT / TLS that creates threads on method invocation, executing the continuation speculatively. Although MLS can subsume loop level speculation, it is still a relatively unexplored paradigm. The Java programming language and virtual machine environment are rich and complex, posing many implementation challenges, but also supporting a compelling variety of object-oriented programs.

We first describe the design and implementation of a prototype system in a Java bytecode interpreter. This includes support for various MLS components, such as return value prediction and dependence buffering, as well as various interactions with features of the Java virtual machine, for example bytecode interpretation, exception handling, and the Java memory model. Experimentally we found that although high thread overheads preclude speedup, we could extract significant parallelism if overheads were excluded. Furthermore, profiling revealed three key areas for optimization.

The first key area for optimization was the return value prediction system. In our initial model, a variety of predictors were all executing naïvely on every method invocation, in order that a hybrid predictor might select the best performing ones. We developed an adaptive system wherein hybrid predictors dynamically specialize on a per-callsite basis, thus dramatically reducing speed and memory costs whilst maintaining high accuracy.

The second area for optimization was the nesting model. Our initial system only allowed for out-of-order nesting, wherein a single parent thread creates multiple child threads. Enabling support for in-order nesting exposes significantly more parallelization opportunities, because now speculative child threads can create their own children that are even more speculative. This required developing a memory manager for child threads based on recycling aggregate data structures. We present an operational semantics for our nesting model derived from our implementation.

Finally, we use this semantics to address the third area for optimization, namely a need for better fork heuristics. Initial heuristics based on online profiling made it difficult to identify the best places to create threads due to complex feedback interactions between speculation decisions at independent speculation points. This problem grew exponentially worse with the support for in-order nesting. Instead, we chose to clarify the effect of program structure on runtime parallelism. We did this by systematically exploring the interaction between speculation and a variety of coding idioms. The patterns we identify are intended to guide both manual parallelization and static compilation efforts.



View the thesis (.pdf) View the defense slides (.pdf) BibTeX entry

TAMING MATLAB

back
Author: Anton Dubrau
Date: April 2012

Abstract

MATLAB is a dynamic scientific language used by scientists, engineers and students worldwide. Although MATLAB is very suitable for rapid prototyping and development, MATLAB users often want to convert their final MATLAB programs to a static language such as FORTRAN, to integrate them into already existing programs of that language, to leverage the performance of powerful static compilers, or to ease the distribution of executables.

This thesis presents an extensible object-oriented toolkit to help facilitate the generation of static programs from dynamic MATLAB programs. Our open source toolkit, called the MATLAB Tamer, targets a large subset of MATLAB. Given information about the entry point of the program, the MATLAB Tamer builds a complete callgraph, transforms every function into a reduced intermediate representation, and provides typing information to aid the generation of static code.

In order to provide this functionality, we need to handle a large number of MATLAB builtin functions. Part of the Tamer framework is the builtin framework, an extensible toolkit which provides a principled approach to handle a large number of builtin functions. To build the callgraph, we provide an interprocedural analysis framework, which can be used to implement full-program analyses. Using this interprocedural framework, we have developed value analysis, an extensible interprocedural analysis to estimate MATLAB types, which helps discover the call edges needed to build the call graph.

In order to make the static analyses even possible, we disallow a small number of MATLAB constructs and features, but attempt to support as large a subset of MATLAB as possible. Thus, by both slightly restricting MATLAB, and by providing a framework with powerful analyses and simplifying transformations, we can “Tame MATLAB”.


View the thesis (PDF) BibTex entry

IMPLEMENTATION AND OPTIMIZATION OF THREAD-LOCAL VARIABLES FOR A RACE-FREE JAVA DIALECT

back
Author: Yi Zhang
Date: December 2011

Abstract

Despite the popularity of Java, problems may arise from potential data-race conditions during execution of a Java program. Data-races are considered errors in concurrent programming languages and greatly complicate both programming and runtime optimization efforts. A race-free version of Java is therefore desirable as a way of avoiding this complexity and simplifying the programming model.

This thesis is part of work trying to build a race-free version of Java. It implements and optimizes thread-local accesses and comes up with a new semantics for this language. An important part of implementing a language without races is to distinguish thread-local data from shared data because these two groups of data need to be treated differently. This is complex in Java because in the current Java semantics all objects are allocated on a single heap and implicitly shared by multiple threads. Furthermore, while Java does provide a mechanism for thread-local storage, it is awkward to use and inefficient.

Many of the new concurrent programming languages, such as OpenMP, UPC, and D, use “sharing directives” to distinguish shared data from thread-local data, and have features that make heavy use of thread-local data. Our goal here is to apply some of these language ideas to a Java context in order to provide a simpler and less error-prone programming model. When porting such features as part of a language extension to Java, however, performance can suffer due to the simple, map-based implementation of Java's built-in ThreadLocal class. We implement an optimized mechanism based on programmer annotations that can efficiently ensure class and instance variables are only accessed by their owner thread. Both class and instance variables inherit values from the parent thread through deep copying, allowing all the reachable objects of child threads to have local copies if syntactically specified. In particular, class variable access involves direct access to thread-local variables through a localized heap, which is faster and easier than the default map mechanism defined for ThreadLocal objects. Our design improves performance significantly over the traditional thread-local access method for class variables and provides a simplified and more appealing syntax for doing so. We further evaluate our approach by modifying non-trivial, existing benchmarks to make better use of thread-local features, illustrating feasibility and allowing us to measure the performance in realistic contexts. This work is intended to bring us closer to designs for a complete race-free version of Java, as well as show how improved support for use of thread-local data could be implemented in other languages.


View the thesis (PDF) BibTex entry

MCSAF: AN EXTENSIBLE STATIC ANALYSIS FRAMEWORK FOR THE MATLAB LANGUAGE

back
Author: Jesse Doherty
Date: August 2011

Abstract

MATLAB® is a popular language for scientific and numerical programming. Despite its popularity, there are few active projects providing open tools for MATLAB related compiler research. This thesis provides the McLAB Static Analysis Framework,Mc SAF, the goal of which is to simplify the development of new compiler tools for MATLAB.

The McLAB project was started in order to develop such tools in the hopes of attracting further research. The goal of the project is to provide an extensible compiler toolkit for MATLAB and scientific programming. It is intended to explore the compilation challenges unique to MATLAB and to explore new language features that could help scientific programmers be more productive. One piece of functionality that is particularly important for compiler research is the ability to perform static analysis. Without the information provided by static analyses, program transformations and optimizations, and automated programmer feedback would not be possible.

In order to make the development of static analyses simpler, this thesis contributes a framework for creating static analyses for the MATLAB language. This framework is intended to make writing analyses easier by providing core functionality and API for developing such analyses. It also aims to make analysis development easier by providing an intermediate representation called McLAST, which provides simpler syntax and explicitly exposes some of MATLAB’s semantics. In order to give analysis writers a head start, some example analyses are provided. These include simple analyses intended to demonstrate the use of the framework, and some more complicated analyses that provide basic semantic information about MATLAB programs.

In addition to the framework making development of analyses simpler, McSAF is also designed to be extended to new language features. Not only can the framework be extended, but existing analyses can also be extended. This allows work that was previously done for analyzing MATLAB code to be applied to future language extensions.


View the thesis (PDF) BibTex entry

McFLAT : A Profile-based Framework for MATLAB Loop Analysis and Transformations

back
Author: Amina Aslam
Date: August 2010

Abstract

Parallelization and optimization of the MATLAB™ programming language presents several challenges due to the dynamic nature of MATLAB. Since MATLAB does not have static type declarations, neither the shape and size of arrays, nor the loop bounds are known at compile-time. This means that many standard array dependence tests and associated transformations cannot be applied straight-forwardly. On the other hand, many MATLAB programs operate on arrays using loops and thus are ideal candidates for loop transformations and possibly loop vectorization/parallelization.

This thesis presents a new framework, McFLAT, which uses profile-based training runs to determine likely loop-bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.

Our McFLAT framework has been implemented as part of the McLAB extensible compiler toolkit. Currently, McFLAT is used to automatically transform ordinary MATLAB code into specialized MATLAB code with transformations applied to it. This specialized code can be executed on any MATLAB system, and we report results for four execution engines, Mathwork’s proprietary MATLAB system, the GNU Octave open-source interpreter, McLAB’s McVM interpreter and the McVM JIT. For several benchmarks, we observed significant speedups for the specialized versions, and noted that loop transformations had different impacts depending on the loop range and execution engine.

This thesis reports on the design and implementation of McFLAT, a framework that is designed to study the effect of various loop transformations on different loop-bound ranges by introducing loop-level specializations in MATLAB programs.


View the thesis (PDF) BibTex entry

AspectMatlab: An Aspect-Oriented Scientific Programming Language

back
Author: Toheed Aslam
Date: February 2010

Abstract

There has been relatively little work done in the compiler research community for incorporating aspect-oriented features in scientific and dynamic programming languages. MATLAB is a dynamic scientific programming language that is commonly used by scientists because of its convenient and high-level syntax for arrays, the fact that type declarations are not required, and the availability of a rich set of application libraries. This thesis introduces a new aspect-oriented scientific language, AspectMatlab.

AspectMatlab introduces key aspect-oriented features in a way that is both accessible to scientists and where the aspect-oriented features concentrate on array accesses and loops, the core computation elements in scientific programs. One of the main contributions of this thesis is to provide a compiler implementation of the AspectMatlab language. It is supported by a collection of scientific use cases, which demonstrate the potential of aspect-orientation for scientific problems.

Introducing aspects into a dynamic language such as MATLAB also provides some new challenges. In particular, it is difficult to statically determine precisely where patterns match, resulting in many dynamic checks in the woven code. The AspectMatlab compiler uses flow analyses to eliminate many of those dynamic checks.

This thesis reports on the language design of AspectMatlab, the amc compiler implementation, and also provides an overview of the use cases that are specific to scientific programming. By providing clear extensions to an already popular language, AspectMatlab will make aspect-oriented programming accessible to a new group of programmers including scientists and engineers.


View the thesis (PDF) BibTex entry

McVM: an Optimizing Virtual Machine for the MATLAB Programming Language

back
Author:Maxime Chevalier-Boisvert
Date: August 2009

Abstract

In recent years, there has been an increase in the popularity of dynamic languages such as Python, Ruby, PHP, JavaScript and MATLAB. Programmers appreciate the productivity gains and ease of use associated with such languages. However, most of them still run in virtual machines which provide no Just-In-Time (JIT) compilation support, and thus perform relatively poorly when compared to their statically compiled counterparts. While the reference MATLAB implementation does include a built-in compiler, this implementation is not open sourced and little is known abouts its internal workings. TheMcVMproject has focused on the design and implementation of an optimizing virtual machine for a subset of the MATLAB programming language.

Virtual machines and JIT compilers can benefit from advantages that static compilers do not have. It is possible for virtual machines to make use of more dynamic information than static compilers have access to, and thus, to implement optimization strategies that are more adapted to dynamic languages. Through theMcVMproject, some possible avenues to significantly improve the performance of dynamic languages have been explored. Namely, a just-in-time type-based program specialization scheme has been implemented in order to take advantage of dynamically available type information.

One of the main contributions of this project is to provide an alternative implementation of the MATLAB programming language. There is already an open source MATLAB interpreter (GNU Octave), but our implementation also includes an optimizing JIT compiler and will be open sourced under the BSD license. McVM aims to become a viable implementation for end-users, but could also see use in the compiler research community as a testbed for dynamic language optimizations. In addition to the contribution of the McVM framework itself, we also contribute the design and implementation of a novel just-in-time type-based program specialization system aimed at dynamic languages.

The novel specialization system implemented in McVM shows much promise in terms of potential speed improvements, yielding performance gains up to 3 orders of magnitude faster than competing implementations such as GNU Octave. It is also easily adaptable to other dynamic programming languages such as Python, Ruby and JavaScript. The investigation of performance issues we make in this thesis also suggests future research directions for the design of dynamic language compilers of the future.


View the thesis (PDF) BibTex entry

Verifying finite-state properties of large-scale programs

back
Author: Eric Bodden
Date: June 2009

Abstract

Designers of software components can use finite-state properties to denote behavioral interface specifications which enforce client-side programming rules that state how the components ought to be used. This allows users of these components to check their client code for compliance with these rules, both statically and at runtime.

In this dissertation we explain the design and implementation of Clara, a framework for specifying and verifying finite-state properties of large-scale programs. With Clara, programmers specify finite-state properties together with runtime monitors, using a syntactic extension to the aspect-oriented programming language AspectJ. Clara then uses a sequence of three increasingly detailed static analyses to determine if the program satisfies the finite-state properties, i.e., is free of property violations.

Clara produces a list of program points at which the program may violate the properties, ranked by a confidence value. If violations are possible, Clara also instruments the program with the supplied runtime monitor, which will capture property violations when the program executes. Due to its static analyses, Clara can omit the instrumentation at program locations which the analyses proved safe, and so optimize the instrumented program. When much instrumentation remains, Clara partitions the instrumentation into subsets, so that one can distribute multiple partially instrumented program versions that each run with a low overhead.

We validated the approach by applying Clara to finite-state properties denoted in multiple formalisms over several large-scale Java programs. Clara proved that most of the programs fulfill our example properties. For most other programs, Clara could remove the monitoring overhead to below 10%. We also found multiple property violations by manually inspecting the top entries in Clara's ranked result list.


View the thesis (PDF) BibTex entry

McFOR: A MATLAB to FORTRAN 95 Compiler

back
Author: Jun Li
Date: August 2009

Abstract

The high-level array programming language MATLAB is widely used for prototyping algorithms and applications of scientific computations. However, its dynamicallytyped nature, which means that MATLAB programs are usually executed via an interpreter, leads to poor performance. An alternative approach would be converting MATLAB programs to equivalent Fortran 95 programs. The resulting programs could be compiled using existing high-performance Fortran compilers and thus could provide better performance. This thesis presents techniques that are developed for our MATLAB-to-Fortran compiler, McFor, for extracting information from the high-level semantics of MATLAB programs to produce efficient and reusable Fortran code.

The McFor compiler includes new type inference techniques for inferring intrinsic type and shape of variables and uses a value-propagation analysis to precisely estimate the sizes of arrays and to eliminate unnecessary array bounds checks and dynamic reallocations. In addition to the techniques for reducing execution overhead, McFor also aims to produce programmer-friendly Fortran code. By utilizing Fortran 95 features, the compiler generates Fortran code that keeps the original program structure and preserves the same function declarations.

We implemented the McFor system and experimented with a set of benchmarks with different kinds of computations. The results show that the compiled Fortran programs perform better than corresponding MATLAB executions, with speedups ranging from 1.16 to 102, depending on the characteristics of the program.


View the thesis (PDF) BibTex entry

The Metalexer Lexer Specification Language

back
Author: Andrew Michael Casey
Date: June 2009

Abstract

Compiler toolkits make it possible to rapidly develop compilers and translators for new programming languages. Recently, toolkit writers have focused on supporting extensible languages and systems that mix the syntaxes of multiple programming languages. However, this work has not been extended down to the lexical analysis level. As a result, users of these toolkits have to rely on ad-hoc solutions when they extend or mix syntaxes. This thesis presents MetaLexer, a new lexical specification language that remedies this deficiency.

MetaLexer has three key features: it abstracts lexical state transitions out of semantic actions, makes modules extensible by introducing multiple inheritance, and provides crossplatformsupport for a variety of programming languages and compiler front-end toolchains.

In addition to designing this new language, we have constructed a number of practical tools. The most important are a pair of translators that map MetaLexer to the popular JFlex lexical specification language and vice versa.

We have exercised MetaLexer by using it to create lexers for three real programming languages: AspectJ (and two extensions), a large subset of Matlab, and MetaLexer itself. The new specifications are easier to read and require much less action code than the originals.


View the thesis (PDF) BibTex entry

Optimizing Software-Hardware Interplay in Efficient Virtual Machines

back
Author: Gregory B. Prokopski
Date: February 2009

Abstract

To achieve the best performance, most computer languages are compiled, either ahead of time and statically, or dynamically during runtime by means of a Just-in-Time (JIT) compiler. Optimizing compilers are complex, however, and for many languages such as Ruby, Python, PHP, etc., an interpreter-based Virtual Machine (VM) offers a more flexible and portable implementation method, and moreover represents an acceptable trade-off between runtime performance and development costs. VM performance is typically maximized by use of the basic directthreading interpretation technique which, unfortunately, interacts poorly with modern branch predictors. More advanced techniques, like code-copying have been proposed [RS96,PR98,EG03c,Gag02] but have remained practically infeasible due to important safety concerns. On this basis we developed two cost-efficient, well-performing solutions.

First, we designed a C/C++ language extension that allows programmers to express the need for the special safety guarantees of code-copying. Our low-maintenance approach is designed as an extension to a highly-optimizing, industry-standard GNU C Compiler (GCC), and we apply it to Java, OCaml, and Ruby interpreters. We tested the performance of these VMs on several architectures, gathering extensive analysis data for both software and hardware performance. Significant improvement is possible, 2.81 times average speedup for OCaml and 1.44 for Java on Intel 32-bit, but varies by VM and platform. We provide detailed analysis and design guidelines for helping developers predict and evaluate the benefit provided by safe code-copying.

In our second approach we focused on alleviating the limited scope of optimizations in code-copying with an ahead-of-time-based (AOT) approach. A source-based approach to grouping bytecode instructions together allows for more extensive crossbytecode optimizations, and so we develop a caching compilation server that specializes interpreter source code to a given input application. Although this requires AOT profiling, it further improves performance over code-copying, 27% average for OCaml and 4-5% on selected Java benchmarks.

This thesis work provides designs for low maintenance, high efficiency VMs, and also demonstrates the large performance improvement potentially enabled by tailoring language implementation to modern hardware. Our designs are both based on understanding the impact of lower-level components on VM behavior. By optimizing the software-hardware interplay we thus show significant speed-up is possible with very minimal VM and compiler development costs.

View the thesis (PDF) BibTex entry

Aspect Impact Analysis

back
Author: Dehua Zhang
Date: August 2008

Abstract

One of the major challenges in aspect-oriented programming is that aspects may have unintended impacts on a base program. Thus, it is important to develop techniques and
tools that can both summarize the impacts and provide information about the causes of the impacts. This thesis presents impact analyses for AspectJ.

Our approach detects different ways advice and inter-type declarations interact and interfere with the base program and focuses on four kinds of impacts, state impacts which
cause changes of state in the base program, computation impacts which cause changes in functionality by adding, removing or replacing computations of the base program, shad-
owing impacts which cause changes of field reference in the base program, and lookup impacts which cause changes of method lookup in the base program.

We provide a classification scheme for these kinds of impacts and then develop a set of static analyses to estimate these impacts. A key feature of our approach is the use of
points-to analysis to provide more accurate estimates. Further, our analysis results allow us to trace back to find the causes of the impacts.

We have implemented our techniques in the AspectBench compiler. By implementing them in an AspectJ compiler, all kinds of pointcuts, advice and inter-type declarations can
be analyzed. We also have integrated these analyses into an AspectJ IDE and provided a two-way navigation between impacts and program source code. In addition, we have
carried out experiments on example programs and benchmarks to investigate the results of our analyses.

View the thesis (PDF) BibTex entry

Static Lock Allocation

back
Author: Richard L. Halpert
Date: April 2008

Abstract

The allocation of lock objects to critical sections in concurrent programs affects both performance and correctness. Traditionally, this allocation is done manually by the programmer. Recent work explores automatic lock allocation, aiming primarily to minimize conflicts and maximize parallelism by allocating locks to individual critical sections. We investigate several modes of lock allocation, using connected components (groups) of interfering critical sections on a critical section inteference graph as the basis for allocation decisions. Our allocator uses thread-based side effect analysis which is built from several pluggable component analyses. It benefits from precise points-to and may happen in parallel information. Thread-local object information provides a small improvement over points-to analysis alone. Our framework minimizes the restrictions on input programs, dealing gracefully with nesting and deadlock, and requiring only simple annotations identifying all critical sections. Legacy programs using synchronized regions can be processed without alteration. Dynamic locks do not broadly improve upon identical allocations of static locks, but allocating several dynamic locks in place of a single static lock can significantly increase parallelism. We experiment with a range of small and large Java benchmarks on 1 to 8 processors, and find that singleton locking is sufficient for four of our benchmarks, and that static locking with Spark points-to analysis is sufficient for another two. Of the other five benchmarks, two require the use of all phases of our analysis, one depends on using the lockset transformation, and two benchmarks proved too complex to be automatically transformed to satisfactory performance.

View the thesis (PDF) BibTex entry

Java bytecode obfuscation

back
Author: Michael R. Batchelder
Date: January 2007

Abstract

Programs written for machine execution will always be susceptible to information theft. This information can include trademarked algorithms, data embedded in the program, or even data the program accesses. As technology advances computer scientists are building more and more powerful tools for reverse-engineering such as the decompiler.

The Java programming language is particularly open to reverse-engineering attacks be- cause of its well-defined, open, and portable binary format. We examine one area of better- securing the intellectual property of a Java program; obfuscation. Obfuscation of a program involves transforming the code of the program into a more complex but semantically equiv- alent representation. This can include the addition of confusing control flow, the removal of certain information embedded in the program which is not explicitly required for execution, or the cloaking of data.

Obfuscation is one of the only techniques available other than cryptological options. While many approaches to obfuscation are ultimately reversible, it nevertheless seriously hinders those attempting to steal information by increasing the computing time and power required by software to reverse-engineer the program and also severely increases the com- plexity of any source code that is recovered by the reverse-engineering.

In this thesis we present a number of obfuscating transformations implemented within an automatic tool we name the Java Bytecode Obfuscator (JBCO). We present empirical measures of the performance costs of these transformations in terms of execution speed and program size. Complexity measurements that gauge the effectiveness of the obfuscations are also given. Finally, we review the feasibility of each transformation by looking at source code generated from obfuscated bytecode by various decompilers.

View the thesis (PDF) BibTex entry

Programmer-friendly decompiled Java

back
Author: Nomair A. Naeem
Date: January 2007

Abstract

Java decompilers convert Java class files to Java source. Common Java decompilers are javac-specific decompilers since they target bytecode produced from a particular javac compiler. We present work carried out on Dava, a tool-independent decompiler that de- compiles bytecode produced from any compiler. A known deficiency of tool-independent decompilers is the generation of complicated decompiled Java source which does not re- semble the original source as closely as output produced by javac-specific decompilers. This thesis tackles this short-coming, for Dava, by introducing a new back-end consisting of simplifying transformations.

The work presented can be broken into three major categories: transformations using tree traversals and pattern matching to simplify the control flow, the creation of a flow analysis framework for an Abstract Syntax Tree (AST) representation of Java source code and the implementation of flow analyses with their use in complicated transformations.

The pattern matching transformations rewrite the ASTs to semantically-equivalent ASTs that correspond to code that is easier for programmers to understand. The targeted Java con- structs include If and If-Else aggregation, for-loop creation and the removal of abrupt control flow. Pattern matching using tree traversals has its limitations. Thus, we introduce a new structure-based data flow analysis framework that can be used to gather informa- tion required by more complex transformations. Popular compiler analyses e.g., reaching definitions, constant propagation etc. were implemented using the framework. Information from these analyses is then leveraged to perform more advanced AST transformations.

We performed experiments comparing different decompiler outputs for different sources of bytecode. The results from these experiments indicate that the new Dava back-end con- siderably improves code comprehensibility and readability.

View the thesis (PDF) Download the thesis (.ps.gz) BibTex entry

Information flow in a Java intermediate language

back
Author: Ahmer Ahmedani
Date: August 2006

Abstract

It is a common practice to retrieve code from an outside source, execute it and return the result to the user. During execution secure data can enter the program by user input or access of a data resource. It is important to track the secure data once it enters the program to identify possible information flows to unwanted regions of the code which would permit undesirable data output to a user. Most approaches to restrict information flow in programs have fallen short of providing a practical solution for mainstream programming languages.

To address this issue, this thesis presents two context-sensitive inter-procedural analyses which analyze an intermediate representation of Java Bytecode for secure information flow. The first analysis assumes that there is only one instance of all class fields where as the second analysis uses points-to information to differentiate between instance fields which belong to different instances of the same class. The analyses track secure information in the program by maintaining sets of secure data. The analyses resolve dynamic method resolution in Java statically by analyzing all possible methods which may be invoked at a call site and merging the secure data sets. We were able to define rules to analyze all the statements in the intermediate representation and also accounted for Java libraries. The analyses do not expect any security annotations in the program.

Type information is useful in debugging, guiding optimizations, and specifying and providing safety proofs for programs. A type system for a subset of the Java Bytecode intermediate representation is also formulated in this thesis. An operational semantics is specified and a type preservation proof assures the soundness of the type system.

Non-trivial benchmarks were analyzed and explicit and implicit information flows were counted for both analyses. The empirical data collected suggests secure data is used in many statements of programs and output of data to a user at several places in a program can lead data.

View the thesis (PDF) Download the thesis (.ps.gz) BibTex entry

Dynamic data structure analysis and visualization of Java programs

back
Author: Sokhom Pheng
Date: May 2006

Abstract

For many years, programmers have faced the problem of reading and trying to understand other programmers' code, either to maintain it or to learn from it. Analysis of dynamic data structure usage is useful for both program understanding and for improving the accuracy of other program analyses.

Data structure usage has been the target of various static techniques. Static approaches, however, may suffer from reduced accuracy in complex situations and have the potential to be overly-conservative in their approximation. An accurate, clean picture of runtime heap activity is difficult to achieve.

We have designed and implemented a dynamic heap analysis system that allows one to examine and analyze how Java programs build and modify data structures. Using a complete execution trace from a profiled run of the program, we build an internal representation that mirrors the evolving runtime data structures. The resulting series of representations can then be analyzed and visualized. This gives us an accurate representation of the data structures created and an insight into the program’s behaviour. Furthermore we show how to use our approach to help understand how programs use data structures, the precise effect of garbage collection, and to establish limits on static data structure analysis.

A deep understanding of dynamic data structures is particularly important for modern, object-oriented languages that make extensive use of heap-based data structures. These analysis results can be useful for an important group of applications such as parallelization, garbage collection optimization, program understanding or improvements to other optimization.

View the thesis (PDF) Download the thesis (.ps.gz) BibTex entry

Shimple: An Investigation of Static Single Assignment Form

back
Author: Navindra Umanee
Date: February 2006

Abstract

In designing compiler analyses and optimisations, the choice of intermediate representation (IR) is a crucial one. Static Single Assignment (SSA) form in particular is an IR with interesting properties, enabling certain analyses and optimisations to be more easily implemented while also being more powerful. Our goal has been to research and implement various SSA-based IRs for the Soot compiler framework for Java.

We present three new IRs based on our Shimple framework for Soot. Simple Shimple is an implementation of the basic SSA form. We explore extensions of SSA form (Extended SSA form and Static Single Information form) and unify them in our implementation of Extended Shimple. Thirdly, we explore the possibility of applying the concepts of SSA form to array element analysis in the context of Java and present Array Shimple.

We have designed Shimple to be extensible and reusable, facilitating further research into variants and improvements of SSA form. We have also implemented several analyses and optimisations to demonstrate the utility of Shimple.

View the thesis (Single Sided PDF) Download the thesis (Double Sided PDF) BibTex entry

Visualization Tools for Optimizing Compilers

back
Author: Jennifer Lhoták
Date: February 2006

Abstract

Optimizing compilers have traditionally had little support for visual tools which display the vast amount of information generated and which could aid in the development of analyses and teaching and provide extra information to general programmers. This thesis presents a set of visualization tools which integrate visualization support for Soot, an optimizing compiler framework, into Eclipse, a popular, extensible IDE.

In particular, this work explores making the research compiler framework more accessible to new users and general programmers. Tools for displaying data flow analysis results in intermediate representations (IRs) and in the original source code are discussed, with consideration for the issue of the mapping of information between the low-level IRs and the source using flexible and extensible mechanisms. Also described are tools for interactive control flow graphs which can be used for research and teaching and tools for displaying large graphs, such as call graphs, in a manageable way.

Additionally, the area of communicating information generated by optimizing compilers to general programmers is explored with a small case study to determine if analyses could be useful to general programmers and how the information is displayed.

This work is shown to be useful for research to find imprecision or errors in analyses, both from visualizing the intermediate results with the interactive control flow graphs and the final results at the IR and source code levels, and for students learning about compiler optimizations and writing their first data flow analyses.

View the thesis (.pdf)

Program Analysis using Binary Decision Diagrams

back
Author: Ondřej Lhoták
Date: January 2006

Abstract

A fundamental problem in interprocedural program analyses is the need to represent and manipulate collections of large sets. BDDs are a data structure widely used in model checking to compactly encode large state sets. In this dissertation, we develop new techniques and frameworks for applying BDDs to program analysis, and use our BDD-based analyses to gain new insight into factors influencing analysis precision.

To make it feasible to express complicated, interrelated analyses using BDDs, we first present the design and implementation of Jedd, a Java language extension which adds relations implemented with BDDs as a datatype, and makes it possible to express BDD-based algorithms at a higher level than existing BDD libraries.

Using Jedd, we develop Paddle, a framework of context-sensitive points-to and call graph analyses for Java, as well as client analyses that make use of their results. Paddle supports several variations of context-sensitive analyses, including the use of call site strings and abstract receiver object strings as abstractions of context.

We use the Paddle framework to perform an in-depth empirical study of the effect of context-sensitivity variations on the precision of interprocedural program analyses. The use of BDDs enables us to compare context-sensitive analyses on much larger, more realistic benchmarks than has been possible with traditional analysis implementations.

Finally, based on the call graph computed by Paddle, we implement, using Jedd, a novel static analysis of the cflow construct in the aspect-oriented language AspectJ. Thanks to the Jedd high-level representation, the implementation of the analysis closely mirrors its specification.

View the thesis (.ps) View the thesis (.pdf) Download the thesis (.ps.gz) BibTeX entry

Measuring and improving the runtime behaviour of AspectJ programs

back
Author: Christopher Goard
Date: August 2005

Abstract

AspectJ is a popular aspect-oriented extension to Java, providing powerful new features for the modularizing of crosscutting concerns, promising improved code quality. The runtime cost of these features, however, is currently not well understood, and is a concern limiting even more wide-spread adoption of the language. The crosscutting nature of AspectJ complicates the measurement of these costs.

This thesis presents a methodology for analyzing the runtime behaviour of AspectJ programs, with a particular emphasis on identifying runtime overheads resulting from the implementation of AspectJ features. It presents a taxonomy of overhead kinds and defines some new AspectJ-specific dynamic metrics. A toolset for measuring these metrics is described, including both of the current AspectJ compilers: ajc and abc, and results for a newly collected set of AspectJ benchmarks are presented.

Significant overheads are found in some cases, suggesting improvements to the code generation strategy of the AspectJ compilers. Initial implementations of some improvements are presented, resulting, for some benchmarks, in order of magnitude improvements to execution time. These improvements have since been integrated in abc and ajc.

Clearly understanding the runtime behaviour of AspectJ programs should result in both better implementations of the language and more confident adoption by the mainstream.

View the thesis (.pdf)

Runtime Techniques and Interprocedural Analysis in Java Virtual Machines

back
Author: Feng Qian
Date: May 2005

Abstract

Java programs are deployed in a bytecode format that is executed by a Java virtual machine (JVM). JVM performance is determined by several major components: execution engine, garbage collector, and threading system. The static compilation and optimization approach, such as taken in C/C++ compilers, does not fit in Java's execution model very well because Java allows dynamic class loading, lazy resolution, just-in-time (JIT) compilation, and garbage collection. These dynamic features presents new challenges to JVM designers.

In this thesis, we study both the challenges and opportunities of dynamic optimizations in Java virtual machines. Our contributions include a new garbage collector using dynamic techniques and dynamic interprocedural program analyses for speculative optimizations in JIT compilers.

We present a novel approach for reducing garbage collection frequencies. Instead of relying on an ahead-of-time escape analysis or a region-based type system, our approach adapts regions based on the runtime history of an application. By freeing regions with associated stack frames, the system can reduce the frequency of garbage collections. We present the overall idea and provide details of a specific design and implementation.

Dynamic class loading is a two-edged sword. A JIT compiler can speculatively optimize methods base on loaded classes only. However, newly loaded classes may invalidate previous optimization assumptions. We review existing techniques supporting speculative optimizations, including runtime guards, code patching, and on-stack replacement. We present an improvement and implementation of an on-stack replacement mechanism.

A call graph is necessary for developing interprocedural program analyses. Call graph construction in a Java virtual machine needs to overcome the difficulties of dynamic class loading and lazy reference resolution. We show a general approach to adapt static type analyses to dynamic versions suitable for building call graphs in a JIT environment. We also introduce a new call graph profiling mechanism using code stubs.

Having dynamic call graphs, we study reachability-based interprocedural analysis. We describe a general type analysis framework for supporting speculative method inlining in a JIT environment. Several popular type analyses were implemented in the framework, including an interprocedural one, VTA~\cite{Sundaresan00VTA}. Using the framework, we present the results of a limit study of method inlining and report our findings and experience.

In each chapter we discuss the related work for that chapter's topic. At the end of the thesis, we point out future research directions.

View the thesis (.pdf)

Using Inter-Procedural Side-Effect Information in JIT Optimizations

back
Author: Anatole Le
Date: February 2005

Abstract

Side-effect analysis gives information about the set of locations that a statement may read or modify. This analysis can provide information useful in a compiler for performing aggressive optimizations. The impact of the use of side-effect analysis in compiler optimizations has been studied for programming languages such as Modula-3 and C, but no thorough investigation for Java has been done. We present a study of whether side-effect information improves performance in Java just-in-time (JIT) compilers, and if so, what level of analysis precision is needed. We also analyze the optimizations and benchmarks that benefit most from side-effect analysis.

We used Spark, the inter-procedural analysis component of the Soot Java analysis and optimization framework, to compute side-effect information and encode it in class files. We modified Jikes RVM, a research JIT, to make use of side-effect analysis in various local and global analyses and optimizations such as local common sub-expression elimination, heap SSA, redundant load elimination and loop-invariant code motion. On the SpecJVM98 benchmarks, we measured the static number of memory operations removed, the dynamic counts of memory reads eliminated, and the execution time.

Our results show that the use of side-effect analysis increases the number of static opportunities for load elimination by up to 98%, and reduces dynamic field read instructions by up to 27%. Side-effect information enabled speedups of up to 20% for some benchmarks. The main cause of the speedups is the use of side-effect information in load elimination. Finally, among the different levels of precision of side-effect information, a simple side-effect analysis is usually sufficient to obtain most of these speedups.

View the thesis (.pdf) Download the thesis (.ps.gz) BibTex entry

Objective Quantification of Program Behaviour Using Dynamic Metrics

back
Author: Bruno Dufour
Date: October 2004

Abstract

In order to perform meaningful experiments in optimizing compilation and runtime system design, researchers usually rely on a suite of benchmark programs of interest to the optimization technique under consideration. Programs are described as numeric, memory-intensive, concurrent, or object-oriented, based on a qualitative appraisal, in some cases with little justification.

In order to make these intuitive notions of program behaviour more concrete and subject to experimental validation, this thesis introduces a methodology to objectively quantify key aspects of program behaviour using dynamic metrics. A set of unambiguous, dynamic, robust and architecture-independent dynamic metrics is defined, and can be used to categorize programs according to their dynamic behaviour in five areas: size, data structures, memory use, polymorphism and concurrency. Each metric is also empirically validated.

A general-purpose, easily extensible dynamic analysis framework has been designed and implemented to gather empirical metric results. This framework consists of three major components. The profiling agent collects execution data from a Java virtual machine. The trace analyzer performs computations on this data, and the web interface presents the result of the analysis in a convenient and user-friendly way.

The utility of the approach as well as selected specific metrics is validated by examining metric data for a number of commonly used benchmarks. Case studies of program transformations and the consequent effects on metric data are also considered. Results show that the information that can be obtained from the metrics not only corresponds well with the intuitive notions of program behaviour, but can also reveal interesting behaviour that would have otherwise required lengthy investigations using more traditional techniques.

View the thesis (.pdf) Download the thesis (.ps.gz) BibTex entry

Dataflow Analysis of the Pi-Calculus

back
Author: Sam Sanjabi
Date: September 2004

Abstract

Static analysis is a technique used to compute information about the runtime behaviour of a program prior to execution. Traditionally, it has been used in the context of optimizing compilers, but it has recently been applied to more formalized languages in order to develop provable policies that can be used to verify the security of networks. Best results are naturally achieved with the most precise information flow techniques, though complex systems impose feasibility constraints. Accuracy of results, particularly with respect to relative cost of computation is thus an important quality.

This thesis presents a series of dataflow analyses of the pi-calculus, an extensively studied concurrent language that has been used to model and verify security protocols. Some of the presented analyses are equivalent to previous work done in the field, but the framework in which the analysis is done is new in that it immediately suggests an iterative implementation.

There are also analyses presented that improve on existing approaches in two ways. First, by fully treating the sequentiality of potential actions in a protocol, thereby improving the accuracy of previous approaches. Second, by considering the potential environment that a process could be running in, the computed results are correct independent of any context that the analyzed process may be in parallel composition with.

View the thesis (.pdf) Download the thesis (.ps.gz)

SableJIT: A retargetable just-in-time compiler

back
Author: David Belanger
Date: August 2004

Abstract

In this thesis, we introduce SableJIT, a retargetable just-in-time compiler for the SableVM Java virtual machine. Our design attempts to minimize the amount of work required to port (or to retarget) our compiler to a new platform. We accomplish this in three ways. First, we introduce a retargetable backend where the amount of work required to port it is reduced to the implementation of simple well-defined primitive functions. Second, we keep all code related to the internals of the virtual machine in the frontend, making knowledge of the target architecture sufficient for a port. Finally, we provide a good development environment that favours incremental development and testing, in part through a robust runtime system that can recover from compilation failures.

We demonstrate the portability of SableJIT by supporting three processor architectures and various operating systems. In particular, we describe the experience acquired in porting our compiler to the Solaris/SPARC platform.

View the thesis (PDF) Download the thesis (.ps.gz) BibTex entry

A practical MHP information computation for concurrent Java programs

back
Author: Lin Li
Date: August 2004

Abstract

With the development of multi-processors, multi-threaded programs and programing languages have become more and more popular. This requires extending the scope of program analysis and compiler optimization from traditional, sequential programs to concurrent programs.

Naumovich et al. proposed May Happen in Parallel (MHP) analysis that determines which program statements may be executed concurrently. From this information, compiler optimization improvements, as well as analysis data on potential program problems such as dataraces can be analyzed or discovered.

Unfortunately, MHP analysis has some limitations with respect to practical use. In this thesis we present an implementation of MHP analysis for Java that attempts to address some of the practical implementation concerns of the original work. We describe a design that incorporates techniques for aiding a feasible implementation and expanding the range of acceptable inputs.

The MHP analysis requires a particular internal representation in order to run. By using a combination of techniques, we are able to compact that representation, and thus significantly improve MHP execution time without affecting accuracy. We also provide experimental results showing the utility and impact of our approach and optimizations using a variety of concurrent benchmarks. The results show that our optimizations are effective, and allow more and larger benchmarks to be analyzed. For some benchmarks, our optimizations have very impressive results, speeding up MHP analysis by several orders of magnitude.

View the thesis (.pdf) Download the thesis (.ps.gz) BibTex entry

STEP: A Framework for the Efficient Encoding of General Trace Data

back
Author: Rhodes H. F. Brown
Date: May 2003

Abstract

Program tracing is a common technique employed by software and hardware developers who are interested in characterizing the dynamic behavior of complex software systems. However, despite the popularity of trace-driven analyses, there are surprisingly few options for encoding trace data in a standard format.

In the past, many developers have resorted to creating their own ad-hoc trace encoding solutions, tailored specifically to the data they are considering. Such efforts are usually redundant, and in many cases lead to an obscure and poorly documented trace format which ultimately limits the reuse and sharing of potentially valuable information.

The STEP system was created to address this problem by providing a standard method for encoding general program trace data in a flexible and compact format. The system consists of a trace data definition language along with a compiler for the language and an encoding architecture that implements a number of common trace compaction techniques. The system simplifies the development and interoperability of trace clients by encapsulating the encoding process and presenting the data as an abstract object stream.

This thesis presents a detailed description of the STEP system and evaluates its utility by applying it to a variety of trace data from Java programs. Initial results indicate that compressed STEP encodings are often substantially more compact than similarly compressed naive formats.

View the thesis (.pdf) Download the thesis (.ps.gz) BibTex entry

New algorithms for a Java decompiler and their implementation in Soot

back
Author: Jerome Miecznikowski
Date: April 2003

Abstract

This thesis presents Dava, a Java bytecode to Java source code decompiler built on top of the Soot framework.

The Java Virtual Machine Specification of valid bytecode is much less restrictive than the Java Language Specification of valid Java source programs. For example, bytecode has unstructured control flow, loose local typing, and few restrictions on method modifiers. By contrast, the Java language has highly structured control flow, strong local typing, and many restrictions on its method modifiers. The goal of this thesis was to build a tool that could correctly decompile the widest range of verifiable Java bytecode back into compilable Java source. This includes bytecode coming from other source languages, bytecode that has undergone certain types of obfuscation, and optimized bytecode. To accomplish this goal we created the Structure Encapsulation Tree data structure and a set of new decompiling algorithms.

The algorithms fall into three categories: regular control flow, exceptional control flow, and idioms. Regular control flow includes finding constructs such as loops, if-else statements, and labeled blocks. Exceptional control flow refers strictly to try-catch statements. Idioms is a collection of miscellaneous restructuring enhance- ments and techniques for ensuring that we produce syntactically correct Java. The Structure Encapsulation Tree allows us to address various decompiling prob- lems one at a time, in order of their difficulty. For example, we find all loops before we find any if-else statements. The result is a very robust algorithm that will restructure any control flow graph without resorting to tricks such as encoding the control flow graph as a finite state machine.

We test our implementation of the decompiler on a wide range of inputs and compare the results to the output of four leading Java decompilers. Our results show that Dava always produces correct source code and that it far outperforms the competition on all of our more advanced tests.

View the thesis (.pdf) Download the thesis (.ps.gz)

Spark: A flexible points-to analysis framework for Java

back
Author: Ondřej Lhoták
Date: February 2003

Abstract

Many compiler analyses and optimizations require precise information about the behaviour of pointers in order to be effective. Points-to analysis is a technique for computing this information that has been studied extensively over the last decade. Most of this research has focused on points-to analyses for C. The behaviour of points-to analysis on higher-level languages such as Java appears very different than on C. Moreover, most proposed points-to analysis techniques were evaluated in disparate analysis systems and benchmarks, making it difficult to compare their effectiveness.

To address these issues, this thesis introduces Spark, a flexible framework for experimenting with points-to analyses for Java. Spark is intended to be a universal framework within which different points-to analyses can be easily implemented and compared in a common context. Currently, Spark supports equality- and subset-based analyses, variations in field sensitivity, respect for declared types, variations in call graph construction, off-line simplification, and several points-to set propagation algorithms.

A substantial study of factors affecting precision and efficiency of points-to analyses has been performed as a demonstration of Spark in action. The results show that Spark is not only flexible and modular, but also very efficient compared to other points-to analysis implementations.

Two client analyses that use the points-to information are described, call graph construction and side-effect analysis. The side-effect information can be encoded in Java class file attributes, so that it can later be used for optimization by other compilers and virtual machines.

Spark has been demonstrated to be a flexible and efficient framework for Java points-to analysis. Several experiments that could be performed with it are suggested.

View the thesis (.ps) Download the thesis (.ps.gz) BibTeX entry

A Portable Research Framework for the Execution of Java Bytecode

back
Author: Etienne Gagnon
Date: December 2002

Abstract

Compilation to bytecode paired with interpretation is often used as a technique to easily build prototypes for new programming languages. Some languages, including Java, push this further and use the bytecode layer to isolate programs from the underlying platform. Current state-of-the-art commercial and research Java virtual machines implement advanced just-in-time and adaptive compilation techniques to deliver high-performance execution of Java bytecode. Yet, experimenting with new features such as adding new bytecodes or redesigning the type system can be a daunting task within these complex systems, when new features invalidate assumptions on which the internal dynamic optimizing compiler depends. On the other hand, simpler existing Java bytecode interpreters, written purely in high-level languages, deliver poor performance. The main motivation behind this thesis was to answer the question: How fast can a portable, easily modifiable Java bytecode interpreter be? In order to address this question, we have designed and developed the SableVM research framework, a portable interpreter-based Java virtual machine written in portable C.

In this thesis we introduce innovative techniques for implementing an efficient, yet portable Java bytecode interpreter. These techniques address three areas: instruction dispatch, memory management, and synchronization. Specifically, we show how to implement an inline-threaded engine in the presence of lazy code preparation, without incurring a high synchronization penalty. We then introduce a logical partitioning of runtime system memory that simplifies memory management, and a related sparse interface virtual table design for fast interface-method invocation. We show how to efficiently compute space-efficient garbage collection maps for verifiable bytecode. We also present a bidirectional object layout that simplifies garbage collection. Finally, we introduce an improvement to thin locks, eliminating busy-wait in case of contention.

Our experiments within the SableVM framework show that inline-threading Java delivers significant performance improvement over switch and direct-threading, that sparse interface tables cause no memory loss, and that our map computation algorithm delivers a very small number of distinct garbage collection maps. Our overall performance measurements show that, using our techniques, a portable interpreter can deliver competitive interpretation performance, and even surpass that of a less-portable state-of-the-art interpreter on some benchmarks.

View the thesis (.pdf) Download the thesis (.ps.gz)

Combining Static and Dynamic Data in Code Visualization

back
Author: David Eng
Date: August 2002

Abstract

The task of developing, tuning, and debugging compiler optimizations is a difficult one which can be facilitated by software visualization. There are many characteristics of the code which must be considered when studying the kinds of optimizations which can be performed. These characteristics can include large amounts of data which would be difficult to inspect without the appropriate tools. Both static data collected at compile-time and dynamic runtime data can reveal opportunities for optimization and affect code transformations. In order to expose the behavior of such complex systems, visualizations should include as much information as possible and accommodate the different sources from which this information is acquired.

This thesis presents a visualization framework designed to address these issues. The framework is based on a new, extensible language called JIL which provides a common format for encapsulating intermediate representations and associating them with compile-time and runtime data. We present new contributions which extend existing compiler and profiling frameworks, allowing them to export the intermediate languages, analysis results, and code metadata they collect as JIL documents. Visualization interfaces can then combine the JIL data from separate tools, exposing both static and dynamic characteristics of the underlying code. We present such an interface in the form of a new web-based visualizer, allowing JIL documents to be visualized online in a portable, customizable interface.

View the thesis (.pdf) Download the thesis (.ps.gz)

A Comprehensive Approach to Array Bounds Check Elimination for Java

back
Author: Feng Qian
Date: April 2001

Abstract

The Java programming language requires array reference range checks at run time to guarantee a program's safe execution. If the array index exceeds the range, the run-time environment must throw an IndexOutOfBoundsException at the precise program point where the array reference occurs. Compilers generate conditional branch instructions for implementing array bounds checks. A branch instruction has great performance penalties in modern pipelined architectures. Also, it makes many other optimizations difficult. For array-intensive applications, array bounds checks may cause a heavy run-time overhead, and thus it is beneficial to eliminate all checks which a static analysis can prove to be unneeded. Array bounds checks are required by some other languages such as Ada and Fortran, and some bounds check elimination algorithms have been developed for these kinds of languages. However, these algorithms are not directly applicable for Java applications because of the precise-exception requirement of the language.

We present a new approach to eliminate array bounds checks in Java by using static analyses. Our approach is based upon a flow-sensitive intraprocedural analysis called variable constraint analysis (VCA). VCA collects constraints between locals related to array references. The array bounds check problem is formulated as solving a system of difference constraints. The analysis builds a small constraint graph for each important point in a method, and then computes the shortest-path weight of the graph. The shortest-path weights from upper bound to array index and from the index to lower bound indicates the safety of checks. Using VCA as the base analysis, we also show how two further analyses can improve the results of VCA. Array field analysis is applied on each class and provides information about some arrays stored in fields, while rectangular array analysis is an interprocedural analysis to approximate the shape of arrays, and is useful for finding rectangular (non-ragged) arrays.

We have implemented all three analyses using the Soot bytecode optimization/annotation framework and we transmit the results of the analysis to virtual machines using class file attributes. We have modified the Kaffe JIT, and IBM's High Performance Compiler for Java (HPCJ) (the experiment on HPCJ was conducted by Clark Verbrugge). to make use of these attributes, and we demonstrate significant speed-ups.

View the thesis (.pdf) Download the thesis (.ps.gz)

Soot: A Java Bytecode Optimization Framework

back
Author: Raja Vallée-Rai
Date: July 2000

Abstract

Java provides many attractive features such as platform independence, execution safety, garbage collection and object orientation. These features facilitate application development but are expensive to support; applications written in Java are often much slower than their counterparts written in C or C++. To use these features without having to pay a great performance penalty, sophisticated optimizations and runtime systems are required.

We present Soot, a framework for optimizing Java bytecode. The framework is implemented in Java and supports three intermediate representations for representing Java bytecode: Baf, a streamlined representation of bytecode which is simple to manipulate; Jimple, a typed 3-address intermediate representation suitable for optimization; and Grimp, an aggregated version of Jimple suitable for decompilation. Soot also contains a set of transformations between these intermediate representations, and an application programming interface (API) is provided to write optimizations and analyses on Java bytecodein these forms.

In order to demonstrate the usefulness of the framework, we have implemented intraprocedural and whole program optimizations. To show that whole program bytecode optimization can give performance improvements, we provide experimentalresults for 10 large benchmarks, including 8 SPECjvm98 benchmarks running on JDK1.2. These results show a speedup of up to "38%".

View the thesis (.ps) Download the thesis (.ps.gz)

A study of side-effect analyses for Java

back
Author: Chrislain Razafimahefa
Date: December 1999

Abstract

In the context of an object-oriented programming language such as Java, the ubiquitous use of instance fields and the presence of pointers (references to objects) make it difficult to understand a program's memory usage. Understanding memory usage is a key element for optimizing programs and side-effect analysis, an analysis to estimate variables that are inspected or altered by a computation, is of prime importance for understanding and optimizing Java programs.

In this thesis, we study the impact of a set of practical side-effect analyses on optimizing Java programs. The main contribution of the thesis is the design and implementation of two groups of analyses, namely type-based analysis and refers-to analysis. These analyses use different strategies to capture the effect of an instruction on variables present in the program. The former uses type information whereas the second models the program's storage shape graph. In both type-based analysis and refers-to analysis, two variations of the analysis are presented, one which considers the internal structure of objects such as fields as a key element of the analysis, and another which considers objects as aggregates and ignores their internal structure.

In order to assess the effectiveness of each analysis, we have implemented loop-invariant removal, an optimization which moves invariant computation out of loops. We used the Soot framework to implement the analyses and the optimization. We present both static and dynamic results based on experiments performed on a set of Java benchmarks of various sizes. Our results suggests that side-effect analysis is beneficial. Furthermore, we also notice that analyses that exploit the internal structure of objects, as a key element of the analysis, performs significantly better than their counterparts which do not.

View the thesis (.ps) Download the thesis (.ps.gz)

Practical Techniques for Virtual Call Resulution in Java

back
Author: Vijay Sundaresan
Date: June 1999

Abstract

Virtual method calls are a fundamental feature offered by Java, an object-oriented programming language. However, they are also a source of degradation of performance at run time and imprecision in interprocedural analyses. There are several well known, inexpensive analyses that have been developed for virtual call resolution. However, they have been observed to be effective in resolving method calls in library code, while not being very effective in the benchmark code excluding libraries.

We present a new flow insensitive and context insensitive analysis called reaching type analysis in this thesis. We present the analysis rules for two variations of this analysis, variable type analysis and a coarser grained version declared type analysis. Reaching type analysis is based on an analysis that builds a type propagation graph where nodes represent variables and edges represent the flow of types due to assignments.

We have implemented variable type analysis and declared type analysis, and two existing analyses, class hierarchy analysis and rapid type analysis, in the Soot framework and compared their relative success at building accurate call graphs for complete applications. We present static empirical results for call graph improvement for the whole application as well as for the benchmark code alone. We have also made dynamic measurements focusing on the benchmark code excluding libraries.

Method inlining is a compiler optimization in which the method call is replaced by the body of the method being called. Method inlining is very effective in improving performance of benchmarks that have many small methods and in which a large proportion of instructions executed are virtual calls. We have implemented method inlining (automatic and profile guided) at the Java bytecode level using the Soot framework. We demonstrate the effectiveness of our analyses and method inlining on a set of 15 benchmarks whose bytecodes were generated from Java, ML, Ada, Eiffel and Pizza compilers.

View the thesis (.ps) Download the thesis (.ps.gz)

SableCC, an Object Oriented Compiler Framework

back
Author: Etienne Gagnon
Date: March 1998

Abstract

In this thesis, we introduce SableCC, an object-oriented framework that generates compilers (and interpreters) in the Java programming language. This framework is based on two fundamental design decisions. Firstly, the framework uses object-oriented techniques to automatically build a strictly-typed abstract syntax tree that matches the grammar of the compiled language and simplifies debugging. Secondly, the framework generates tree-walker classes using an extended version of the visitor design pattern which enables the implementation of actions on the nodes of the ab- stract syntax tree using inheritance. These two design decisions lead to a tool that supports a shorter development cycle for constructing compilers.

To demonstrate the simplicity of the framework, we discuss the implementation of a state-of-the-art almost linear time points-to analysis. We also provide a brief description of other systems that have been implemented using the SableCC tool.

We conclude that the use of object-oriented techniques significantly reduces the length of the programmer written code, can shorten the development time and finally, makes the code easier to read and maintain.

View the thesis (.ps) Download the thesis (.ps.gz)