McLab Publications


Lectures and Tutorials

  1. VeloCty Lecture At CASCON
    Description

    We presented an overview of VeloCty at the 13th Compiler Driven Performance workshop. The CDP workshop was held during the IBM Center for Advanced Studies Conference (CASCON). The workshop featured talks on topics such as innovative analysis, languages, compilers and optimisation techniques for parallel environments etc. More information can be found here.

    Authors
    Sameer Jagdale   
    Date
    November 2014
    Presented
    CASCON 2014, Markham, Canada
  2. McNumJS Lecture At CASCON
    Description

    We presented the modules and features of the McNumJS library and demonstrated the performance of the library compared to Ostrich benchmark suite at CASCON Compiler Driven Performance workshop.

    Authors
    Sujay Kathrotia   
    Date
    November 2014
    Presented
    CASCON 2014, Markham, Canada
  3. MiX10 Lecture At CASCON
    Description

    We had a chance to talk about MiX10 at the 12th Compiler Driven Performance workshop at CASCON 2013. Get the slides here.

    Authors
    Vineet Kumar   
    Date
    November 2013
    Presented
    CASCON 2013, Markham, Canada
  4. Leverhulme Lecture Series
    Description

    During the academic year 2010-2011, Professor Hendren was on sabbatical leave at the University of Oxford, during which time she held a Leverhulme Visiting Professor position. As part of the position she presented a series of three Leverhulme Lectures.

    Authors
    Laurie Hendren   
    Date
    June - July 2011
    Presented
    University of Oxford, Oxford, England
  5. Introduction to McLab, a compiler and VM framework for MATLAB
    Description

    The purpose of this tutorial is to introduce McLab, a new publicly-available toolkit for analyzing and executing MATLAB programs. The tutorial starts by introducing the MATLAB language and the particular challenges the language presents for compiler developers. It then presents the McLab front-end, which supports both standard MATLAB, prebuilt extensions such as AspectMatlab, and the ability to create new language extensions. The middle part of the tutorial focuses on the IRs and analysis framework along with some example analyses. The final part of the tutorial presents the back-ends, with a particular emphasis on the McVM JIT.

    Authors
    Laurie Hendren    Rahul Garg    Nurudeen Lameed   
    Date
    June 2011
    Presented
    PLDI, San Jose, California

Papers

  1. Efficiently implementing the copy semantics of MATLAB's arrays in JavaScript
    Abstract

    Compiling MATLAB — a dynamic, array-based language — to JavaScript is an attractive proposal: the output code can be deployed on a platform used by billions and can leverage the countless hours that have gone into making JavaScript JIT engines fast. But before that can happen, the original MATLAB code must be properly translated, making sure to bridge the semantic gaps of the two languages.

    An important area where MATLAB and JavaScript differ is in their handling of arrays: for example, in MATLAB, arrays are one-indexed and writing at an index beyond the end of an array extends it; in JavaScript, typed arrays are zero-indexed and writing out of bounds is a no-op. A MATLAB-to-JavaScript compiler must address these mismatches. Another salient and pervasive difference between the two languages is the assignment of arrays to variables: in MATLAB, this operation has value semantics, while in JavaScript is has reference semantics.

    In this paper, we present MatJuice — a source-to-source, ahead-of-time compiler back-end for MATLAB — and how it deals efficiently with this last issue. We present an intra-procedural data-flow analysis to track where each array variable may point to and which variables are possibly aliased. We also present the associated copy insertion transformation that uses the points-to information to insert explicit copies when necessary. The resulting JavaScript program respects the MATLAB value semantics and we show that it performs fewer run-time copies than some alternative approaches.

    Authors
    Vincent Foley-Bourgon    Laurie Hendren   
    Date
    1 November 2016
    Published
    DLS 2016, Amsterdam, Netherlands
  2. Automatic Vectorization for MATLAB
    Abstract

    Dynamic array-based languages such as MATLAB provide a wide range of built-in operations which can be efficiently applied to all elements of an array. Historically, MATLAB and Octave programmers have been advised to manually transform loops to equivalent “vectorized” computations in order to maximize performance. In this paper we present the techniques and tools to perform automatic vectorization, including handling for loops with calls to user-defined functions. We evaluate the technique on 9 benchmarks using two interpreters and two JIT-based platforms and show that automatic vectorization is extremely effective for the interpreters on most benchmarks, and moderately effective on some benchmarks in the JIT context.

    Authors
    Hanfeng Chen    Alexander Krolik    Erick Lavoie    Laurie Hendren   
    Date
    28-30 September 2016
    Published
    LCPC 2016, Rochester, NY, USA
  3. Velociraptor: a compiler toolkit for array-based languages targeting CPUs and GPUs
    Abstract

    We present a toolkit called Velociraptor that can be used by compiler writers to quickly build compilers and other tools for array-based languages. Velociraptor operates on its own unique intermediate representation (IR) designed to support a variety of array-based languages. The toolkit also provides some novel analysis and transformations such as region detection and specialization, as well as a dynamic backend with CPU and GPU code generation. We discuss the components of the toolkit and also present case-studies illustrating the use of the toolkit.

    Authors
    Rahul Garg    Laurie Hendren   
    Date
    June 15 - 17, 2015
    Published
    ARRAY@PLDI 2015, Portland, OR, USA
  4. AspectMatlab++: Annotations, Types, and Aspects for Scientists
    Abstract

    In this paper we present extensions to an aspect oriented compiler developed for MATLAB . These extensions are intended to support important functionality for scientists, and include pattern match- ing on annotations, and types of variables, as well as new manners of exposing context. We provide use-cases of these features in the form of several general-use aspects which focus on solving issues that arise from use of dynamically-typed languages. We also de- tail performance enhancements to the AspectMatlab compiler which result in an order of magnitude in performance gains.

    Authors
    Andrew Bodzay    Laurie Hendren   
    Date
    March 2015
    Published
    Modularity 2015, Fort Collins, CO, USA
  5. Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs
    Abstract

    Developing just-in-time (JIT) compilers that that allow scientific programmers to efficiently target both CPUs and GPUs is of increasing interest. However building such compilers requires considerable effort. We present a reusable and embeddable compiler toolkit called Velociraptor that can be used to easily build compilers for numerical programs targeting multicores and GPUs.

    Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for high-level parallel and GPU constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor provides an optimizing compiler toolkit for generating CPU and GPU code and also provides a smart runtime system to manage the GPU.

    To demonstrate Velociraptor in action, we present two proof-of-concept case studies: a GPU extension for a JIT implementation of MATLAB language, and a JIT compiler for Python targeting CPUs and GPUs.

    Authors
    Rahul Garg    Laurie Hendren   
    Date
    August 24-27, 2014
    Published
    PACT 2014, Edmonton, AB, Canada
  6. Mc2For: A tool for automatically translating MATLAB to FORTRAN 95
    Abstract

    MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB’s high-level syntax and dynamic types make it ideal for prototyping, programmers often prefer using high-performance static languages such as FORTRAN for their final distributable code. Rather than rewriting the code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to an equivalent FORTRAN program. There are several important challenges for automatically translating MATLAB to FORTRAN, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to equivalent FORTRAN constructs.

    In this paper, we introduce Mc2FOR, a tool which automatically translates MATLAB to FORTRAN. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shape of arrays and the range of scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated FORTRAN program. The second part is an extensible FORTRAN code generation framework automatically transforming MATLAB constructs to FORTRAN. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated FORTRAN code on a collection of MATLAB benchmarks.

    Authors
    Xu Li    Laurie Hendren   
    Date
    February 3rd, 2014
    Published
    WCRE 2014, Antwerp, Belgium
  7. Optimizing MATLAB Feval with Dynamic Techniques
    Abstract

    MATLAB is a popular dynamic array-based language used by engineers, scientists and students worldwide. The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Programmers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this paper, we show that there are significant performance overheads for function calls via feval, in both MATLAB interpreters and JITs. The paper then proposes, implements and compares two on-the-fly mechanisms for specialization of feval calls. The first approach uses on-stack replacement technology, as supported by McVM/McOSR. The second approach specializes calls of functions with feval using a combination of runtime input argument types and values. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    October 2013
    Published
    DLS 2013, Indianapolis, USA
  8. Refactoring MATLAB
    Abstract

    This paper presents the important challenges of refactoring MATLAB along with automated techniques to handle a collection of refactorings for MATLAB functions and scripts including: converting scripts to functions, extracting functions, and converting dynamic function calls to static ones. The refactorings have been implemented using the McLAB compiler framework, and an evaluation is given on a large set of MATLAB benchmarks which demonstrates the effectiveness of our approach.

    Authors
    Soroush Radpour    Laurie Hendren   
    Date
    March 2013
    Published
    CC 2013, Rome, Italy
  9. A Modular Approach to On-Stack Replacement in LLVM
    Abstract

    This paper presents a modular approach to implementing OSR for the LLVM compiler infrastructure. This is an important step forward because LLVM is gaining popular support, and adding the OSR capability allows compiler developers to develop new dynamic techniques. In particular, it will enable more sophisticated LLVM-based JIT compiler approaches. Indeed, other compiler/VM developers can use our approach because it is a clean modular addition to the standard LLVM distribution. Further, our approach is defined completely at the LLVM-IR level and thus does not require any modifications to the target code generation.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    March 2013
    Published
    VEE 2013, Houston, Texas, USA
  10. Taming MATLAB
    Abstract

    MATLAB is a dynamic scientific language used by scientists, engineers and students worldwide. Although MATLAB is very suitable for rapid prototyping and development, MATLAB users often want to convert their final MATLAB programs to a static language such as FORTRAN. This paper presents an extensible object-oriented toolkit for supporting the generation of static programs from dynamic MATLAB programs. Our open source toolkit, called the MATLAB Tamer, identifies a large tame subset of MATLAB, supports the generation of a specialized Tame IR for that subset, provides a principled approach to handling the large number of builtin MATLAB functions, and supports an extensible interprocedural value analysis for estimating MATLAB types and call graphs.

    Authors
    Anton Dubrau    Laurie Hendren   
    Date
    October 2012
    Published
    OOPSLA 2012, Tucson, Arizona, USA
  11. Kind Analysis for MATLAB
    Abstract

    A fundamental problem in MATLAB is determining the kind of an identifier. Does an identifier refer to a variable, a named function or a prefix? Although this is a trivial problem for most programming languages, it was not clear how to do this properly in MATLAB. Furthermore, there was no simple explanation of kind analysis suitable for MATLAB programmers, nor a publicly-available implementation suitable for compiler researchers. This paper explains the required background of MATLAB, clarifies the kind assignment program, and proposes some general guidelines for developing good kind analyses. Based on these foundations we present our design and implementation of a variety of kind analyses, including an approach that matches the intended behaviour of modern MATLAB 7 and two potentially better alternatives.

    Authors
    Jesse Doherty    Soroush Radpour    Laurie Hendren   
    Date
    October 2012
    Published
    OOPSLA 2012, Tucson, Arizona, USA
  12. MetaLexer: A Modular Lexical Specification Language
    Abstract

    Compiler toolkits make it possible to rapidly develop compilers and translators for new programming languages. Although there exist elegant toolkits for modular and extensible parsers, compiler developers must often resort to ad-hoc solutions when extending or composing lexers. This paper presents MetaLexer, a new modular lexical specification language and associated tool.

    Slides are available here.

    Authors
    Andrew Casey    Laurie Hendren   
    Date
    October 2011
    Published
    AOSD 2011, Pernambuco, Brazil
  13. Typing Aspects for MATLAB
    Abstract

    This paper introduces the idea of adding typing aspects to MATLAB programs. A typing aspect can be used to: (1) capture the run-time types of variables, and (2) to check run-time types against either a declared type or against a previously captured run-time type. Typings aspects can be deployed at three different levels,they can be used: (1) solely as documentation, (2) to log type errors or (3) to catch type errors at run-time.

    Slides are available here.

    Authors
    Laurie Hendren   
    Date
    October 2011
    Published
    DSAL 2011, Pernambuco, Brazi
  14. Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT Compiler
    Abstract

    Several aspects of the MATLAB language such as dynamic loading and typing, safe updates, and copy semantics for arrays contribute to its appeal to the scientific communities, but at the same time provide many challenges to the compiler and virtual machine. One such problem, minimizing the number of copies and copy checks for Matlab programs has not received much attention. The classical approach to minimizing the number of copies (i.e., reference counting) does not work in a garbage-collected virtual machine. This paper presents a staged static analysis approach that does not require reference counts, thus enabling a garbage-collected virtual machine.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    March 2011
    Published
    Compiler Construction (CC) 2011, Saarbrüken, Germany
  15. McFLAT: A Profile-based Framework for MATLAB Loop Analysis and Transformations
    Abstract

    This paper presents a new framework,McFLAT,which uses profile-based training runs to determine likely loop-bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.

    Authors
    Amina Aslam    Laurie Hendren   
    Date
    October 2010
    Published
    LCPC 2010, Houston, Texas, USA
  16. McLab: An extensible compiler toolkit for MATLAB and related languages
    Abstract

    MATLAB is a popular language for scientific computation. Effectively compiling MATLAB presents many challenges due to the dynamic nature of the language. We present McLab, an extensible compiler toolkit for the MATLAB and related languages. McLab aims to provide high performance execution of MATLAB on modern architectures while bringing modern programming concepts such as aspect-oriented programming and other extensions to MATLAB. McLab consists of several components. The first component is an extensible frontend to parse and analyze MATLAB as well as extensions to MATLAB. The second component, called McFor, is a compiler to translate a static subset of MATLAB to FORTRAN. The third component, McVM, is a virtual machine including a JIT compiler to execute MATLAB code. Finally we also provide language extensions such as AspectMatlab. We present the current state of the implementation of McLab and describe ongoing work and future directions of the project.

    Authors
    Andrew Casey    Jun Li    Jesse Doherty    Maxime Chevalier-Boisvert    Toheed Aslam    Anton Dubrau    Nurudeen Lameed    Amina Aslam    Rahul Garg    Soroush Radpour    Olivier Savary Belanger    Laurie Hendren    Clark Verbrugge   
    Date
    May 2010
    Published
    C3S2E '10, Montreal, Canada
  17. Optimizing Matlab through Just-In-Time Specialization
    Abstract

    Scientists are increasingly using dynamic programming languages like Matlab for prototyping and implementation. Effectively compiling Matlab raises many challenges due to the dynamic and complex nature of Matlab types. This paper presents a new JIT-based approach which specializes and optimizes functions on-the-fly based on the current types of function arguments.

    Authors
    Maxime Chevalier-Boisvert    Laurie Hendren    Clark Verbrugge   
    Date
    March 2010
    Published
    Compiler Construction (CC) 2010, Paphos, Cyprus
  18. AspectMatlab: An Aspect-Oriented Scientific Programming Language
    Abstract

    This paper introduces a new aspect-oriented programming language, AspectMatlab. AspectMatlab introduces key aspect-oriented features in a way that is both accessible to scientists and where the aspect-oriented features concentrate on array accesses and loops, the core computation elements in scientific programs. The paper reports on the language design of AspectMatlab, the amc compiler implementation and related optimizations, and also provides an overview of use cases that are specific to scientific programming.

    Authors
    Toheed Aslam    Jesse Doherty    Anton Dubrau    Laurie Hendren   
    Date
    March 2010
    Published
    AOSD 2010, Rennes and Saint-Malo , France

Theses

  1. McIDE: A MATLAB IDE powered by dynamic analysis
    Abstract

    MATLAB is a popular dynamic scientific programming language. The typical MATLAB user is not a software professional; it is chiefly used among scientists, engineers, and students, and enjoys wide adoption in large part because of its high level syntax and wide array of libraries for many problem domains in the sciences. The inexperience of many MATLAB programmers, coupled with the ill-specified and often counterintuitive semantics of the language, leads to MATLAB code in the wild that is difficult to understand and maintain. In this thesis, we present McIDE, an integrated development environment for MATLAB programming. McIDE provides tools to help MATLAB programmers write better programs, among them automated refactorings and code navigation features like “jump to definition”. It is also opinionated about MATLAB code, and tries to recognize common anti-patterns and either warn about or eliminate them. McIDE is built up of several largely independent components wired together by a thin graphical interface. Some of these components are pre-existing, such as a MATLAB parser provided by the McLab compiler toolkit, and others are contributions of this thesis, such as a dynamic call graph collection mechanism for MATLAB code, and a layout-preserving code transformation engine. A theme of McIDE’s implementation is reliance on runtime information, since purely static information is often insufficient if we wish to support the development of arbitrary MATLAB code, including its more dynamic features.

    Authors
    Ismail Badawi   
    Date
    March 2016
    Published
    Master's Thesis, McGill University, Montreal, Canada
  2. VeloCty: A Static Optimising Compiler for MATLAB and NumPy
    Abstract

    High-level scientific languages such as MATLAB and Python’s NumPy library are gaining popularity among scientists and mathematicians. These languages provide many features such as dynamic typing, high-level scientific functions etc. which allow easy prototyping. However these features also inhibit performance of the code. We present VeloCty, an optimizing static compiler for MATLAB and Python as a solution to the problem of enhancing performance of programs written in these languages. In most programs, a large portion of the time is spent executing a small part of the code. Moreover, these sections can often be compiled ahead of time and improved performance can be achieved by optimizing only these `hot’ sections of the code. VeloCty takes as input functions written in MATLAB and Python specified by the user and generates an equivalent C++ version. VeloCty also generates glue code to interface with MATLAB and Python. The generated code can then be compiled and packaged as a shared library that can be linked to any program written in MATLAB and Python. We also implemented optimisations to eliminate array bounds checks, reuse previously allocated memory during array operations and support parallel execution using OpenMP. VeloCty uses the Velociraptor toolkit. We implemented a C++ backend for the Velociraptor intermediate representation, VRIR, and language-specific runtimes for MATLAB and Python. We have also implemented a MATLAB VRIR generator using the mclab toolkit. VeloCty was evaluated using 17 MATLAB benchmarks and 9 Python benchmarks. The MATLAB benchmark versions compiled using VeloCty with all optimisations enabled were between 1.3 to 458 times faster than the MathWorks’ MATLAB2014b interpreter and JIT compiler. Similarly, Python benchmark versions were between 44.11 and 1681 times faster than the CPython interpreter.

    Authors
    Sameer Jagdale   
    Date
    April 2015
    Published
    Master's Thesis, McGill University, Montreal, Canada
  3. McNumJS: A JavaScript Library for Numerical Computations
    Abstract

    There has been a huge development in the web community recently, with an increasing focus on the performance of JavaScript. The development of state-of-the-art JavaScript engines and JavaScript technologies has improved the performance of JavaScript considerably and made it competitive with other dynamic languages. The major advantage of JavaScript applications is that they can run on any device that supports web browsers and distribution of these applications is very easy. This thesis reports on McNumJS, an easy-to-use and high-performance JavaScript library for numerical computations. This library is helpful to JavaScript developers for developing numerical applications and compiler writers who want to compile scientific languages like MATLAB or R to JavaScript.

    There has been a surge of technologies like typed arrays, web workers and asm.js, developed to improve the performance of JavaScript. We analyze these technologies and report their suitability for numerical applications. We have also compiled a detailed study on asm.js and performed different experiments to find the parts of asm.js that we can use in regular development of JavaScript applications.

    There are two main design goals behind the development of McNumJS: i) making it easy-to-use, and ii) provide high-performance. We achieved the easy-to-use goal by making an API similar to the NumPy, a popular python library for scientific computing. To make McNumJS high-performance, we used JavaScript typed arrays and type coercing rules defined by asm.js. We report the speedups we get by using McNumJS compared to other JavaScript libraries and JavaScript with regular arrays. We report the performance difference between McNumJS and native C. These experiments show that the performance of McNumJS library is competitive with native C and outperforms other JavaScript libraries for numerical computations.

    Authors
    Sujay Kathrotia   
    Date
    April 2015
    Published
    Master's Thesis, McGill University, Montreal, Canada
  4. AspectMatlab++: Developing an Aspect-Oriented Language for Scientists
    Abstract

    MATLAB is a popular dynamic array-based language commonly used within the scientific community. MATLAB ’s widespread use can be attributed to its large library of built-in functions, and its high-level syntax, which requires no type declarations, making it ideal for fast prototyping. This thesis presents extensions to AspectMatlab, an aspect oriented compiler developed for MATLAB. AspectMatlab was created with the intent of bringing aspect oriented programming to MATLAB, and targeted features such as array accesses and loops, which are the core computations in scientific programs. This thesis presents AspectMatlab++. AspectMatlab++ extends AspectMatlab by focusing on a different set of challenges, seeking to make aspect-oriented programming easier to use and providing mechanisms to handle a variety of the problems that occur in a dynamically typed language. To this end, we introduce pattern matching on annotations and types of variables, as well as new manners of exposing context. We also provide several use-cases of these features in the form of general-use aspects which focus on solving issues that arise from use of dynamically-typed languages. These include aspects which perform type and unit checking, profiling aspects, as well as as- pects which perform basic loop optimizations. This thesis also details several performance enhancements to the AspectMatlab compiler, which result in a speed improvement of about 10 times.

    Authors
    Andrew Bodzay   
    Date
    December 2015
    Published
    Master's Thesis, McGill University, Montreal, Canada
  5. Mc2For: A MATLAB to Fortran 95 Compiler
    Abstract

    MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB’s high-level syntax and dynamic types make it ideal for fast prototyping, programmers often prefer using high-performance static languages such as FORTRAN for their final distribution. Rather than rewriting the code by hand, our solution is to provide a source-to-source compiler that translates the original MATLAB program to an equivalent MATLAB program.

    In this thesis, we introduce MC2FOR, a source-to-source compiler which transforms MATLAB to FORTRAN and handles several important challenges during the transformation, such as efficiently estimating the static type characteristics of all the variables in a given MATLAB program, mapping numerous MATLAB built-in functions to FORTRAN, and correctly supporting some MATLAB dynamic features in the generated FORTRAN code.

    This compiler consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shapes of the arrays and the ranges of the scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated FORTRAN program. The second part is an extensible FORTRAN code generation framework automatically transforming MATLAB constructs to equivalent FORTRAN constructs.

    This work has been implemented within the McLab framework, and we evaluated the performance of the Mc2For compiler on a collection of 20 MATLAB benchmarks. For most of the benchmarks, the generated FORTRAN program runs 1.2 to 337 times faster than the original MATLAB program, and in terms of physical lines of code, typically grows only by a factor of around 2. These experimental results show that the code generated by Mc2For performs better on average, at the cost of only a modest increase in code size.

    Authors
    Xu Li   
    Date
    April 2014
    Published
    Master's Thesis, McGill University, Montreal, Canada
  6. MiX10: Compiling MATLAB to X10 for high performance
    Abstract

    MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are better suited for emerging high performance computing systems. This thesis reports on MIX10, a source-to-source compiler that automatically translates MATLAB programs to X10, a language designed for “Performance and Productivity at Scale”; thus, helping scientific programmers make better use of high performance computing systems. There is a large semantic gap between the array-based dynamically-typed nature of MATLAB and the object-oriented, statically-typed, and high-level array abstractions of X10. This thesis addresses the major challenges that must be overcome to produce sequential X10 code that is competitive with state-of-the-art static compilers for MATLAB which target more conventional imperative languages such as C and Fortran. Given that efficient basis, the thesis then provides a translation for the MATLAB parfor construct that leverages the powerful concurrency constructs in X10. The MIX10 compiler has been implemented using the McLab compiler tools, is open source, and is available both for compiler researchers and end-user MATLAB programmers. We have used the implementation to perform many empirical measurements on a set of 17 MATLAB benchmarks. We show that our best MIX10-generated code is significantly faster than the de facto Mathworks’ MATLAB system, and that our results are competitive with state-of-the-art static compilers that target C and Fortran. We also show the importance of finding the correct approach to representing the arrays in X10, and the necessity of an IntegerOkay analysis that determines which double variables can be safely represented as integers. Finally, we show that our X10-based handling of the MATLAB parfor greatly outperforms the de facto MATLAB implementation.

    Authors
    Vineet Kumar   
    Date
    April 2014
    Published
    Master's Thesis, McGill University, Montreal, Canada
  7. DYNAMIC COMPILER OPTIMIZATION TECHNIQUES FOR MATLAB
    Abstract

    MATLAB has gained widespread acceptance among engineers and scientists. Several aspects of the language such as dynamic loading and typing, safe updates, copy semantics for arrays, and support for higher-order functions contribute to its appeal, but at the same time provide many challenges to the compiler and virtual machine. MATLAB is a dynamic language. Traditional implementations of the language use interpreters and have been found to be too slow for large computations. More recently, researchers and software developers have been developing JIT compilers for MATLAB and other dynamic languages. This thesis is about the development of new compiler analyses and transformations for a MATLAB JIT compiler, McJIT, which is based on the LLVM JIT compiler toolkit. The new contributions include a collection of novel analyses for optimizing copying of arrays, which are performed when a function is first compiled. We designed and imple- mented four analyses to support an efficient implementation of array copy semantics in a MATLAB JIT compiler. Experimental results show that copy optimization is essential for performance improvement in a compiler for the MATLAB language.

    We also developed a variety of new dynamic analyses and code transformations for optimizing running code on-the-fly according to the current conditions of the runtime en- vironment. LLVM does not currently support on-the-fly code transformation. So, we first developed a new on-stack replacement approach for LLVM. This capability allows the run- time stack to be modified during the execution of a function, thus enabling a continuation of the execution at a higher optimization level. We then used the on-stack replacement implementation to support selective inlining of function calls in long-running loops. Our experimental results show that function calls in long-running loops can result in high run- time overhead, and that selective dynamic inlining can be used to drastically reduce this overhead.

    The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Pro- grammers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this thesis, we show that although feval provides an acceptable abstraction mechanism for these types of applications, there are significant performance overheads for function calls via feval, in both MATLAB inter- preters and JITs. The thesis then proposes, implements and compares two on-the-fly mech- anisms for specialization of feval calls. The first approach uses our on-stack replacement technology. The second approach specializes calls of functions with feval using a combi- nation of runtime input argument types and values. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.

    The implementation of all the analyses and code transformations presented in this thesis has been done within the McLab virtual machine, McVM, and is available to the public as open source software.

    Authors
    Nurudeen Lameed   
    Date
    April 2013
    Published
    Ph.D Thesis, McGill University, Montreal, Canada
  8. Understanding and Refactoring the MATLAB language
    Abstract

    MATLAB is a very popular dynamic “scripting” language for numerical computations used by scientists, engineers and students world-wide. MATLAB programs are often developed incrementally using a mixture of MATLAB scripts and functions and frequently build upon existing code which may use outdated features. This results in programs that could benefit from refactoring, especially if the code will be reused and/or distributed. Despite the need for refactoring there appear to be no MATLAB refactoring tools available. Correct refactoring of MATLAB is quite challenging because of its non-standard rules for binding identifiers. Even simple refactorings are non-trivial. Compiler writers and software engineers are generally not familiar with MATLAB and how it is used so the problem has been left untouched so far. This thesis has two main contributions. The first is McBench, a tool that helps compiler writers understand the language better. In order to have a systematic approach to the problem, we developed this tool to give us some insight about how programmers use MATLAB. The second contribution is a suite of semantic-preserving refactoring for MATLAB functions and scripts including: function and script inlining, converting scripts to functions, extracting new functions, and converting dynamic feval calls to static function calls. These refactorings have been implemented using the McLAB compiler framework, and an evaluation is given on a large set of MATLAB programs which demonstrates the effectiveness of our approach.

    Authors
    Soroush Radpour   
    Date
    August 2012
    Published
    Master's Thesis, McGill University, Montreal, Canada
  9. Taming MATLAB
    Abstract

    This thesis presents an extensible object-oriented toolkit to help facilitate the generation of static programs from dynamic MATLAB programs. Our open source toolkit, called the MATLAB Tamer, targets a large subset of MATLAB. Given information about the entry point of the program, the MATLAB Tamer builds a complete callgraph, transforms every function into a reduced intermediate representation, and provides typing information to aid the generation of static code.

    Authors
    Anton Dubrau   
    Date
    April 2012
    Published
    Master's Thesis, McGill University, Montreal, Canada
  10. MCSAF: AN EXTENSIBLE STATIC ANALYSIS FRAMEWORK FOR THE MATLAB LANGUAGE
    Abstract

    MATLAB is a popular language for scientific and numerical programming. Despite its popularity, there are few active projects providing open tools for MATLAB related compiler research. This thesis provides the McLAB Static Analysis Framework, McSAF, the goal of which is to simplify the development of new compiler tools for MATLAB

    Authors
    Jesse Doherty   
    Date
    August 2011
    Published
    Master's Thesis, McGill University, Montreal, Canada
  11. McFLAT: A Profile-based Framework for MATLAB Loop Analysis and Transformations
    Abstract

    This thesis presents a new framework, McFLAT, which uses profile-based training runs to determine likely loop-bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.

    Authors
    Amina Aslam   
    Date
    August 2010
    Published
    Master's Thesis, McGill University, Montreal, Canada
  12. AspectMatlab: An Aspect-Oriented Scientific Programming Language
    Abstract

    This is the first thesis introducing AspectMatlab.

    Authors
    Toheed Aslam   
    Date
    February 2010
    Published
    Master's Thesis, McGill University, Montreal, Canada
  13. McFOR: A MATLAB to FORTRAN 95 Compiler
    Abstract

    The high-level array programming language MATLAB is widely used for prototyping algorithms and applications of scientific computations. However, its dynamically typed nature, which means that MATLAB programs are usually executed via an interpreter, leads to poor performance. An alternative approach would be converting MATLAB programs to equivalent Fortran 95 programs. The resulting programs could be compiled using existing high-performance Fortran compilers and thus could provide better performance. This thesis introduces McFOR, a MATLAB to FORTRAN 95 Compiler.

    Authors
    Jun Li   
    Date
    August 2009
    Published
    Master's Thesis, McGill University, Montreal, Canada
  14. McVM: an Optimizing Virtual Machine for the MATLAB Programming Language
    Abstract

    In recent years, there has been an increase in the popularity of dynamic languages such as Python, Ruby, PHP, JavaScript and MATLAB. Programmers appreciate the productivity gains and ease of use associated with such languages. However, most of them still run in virtual machines which provide no Just-In-Time (JIT) compilation support, and thus perform relatively poorly when compared to their statically compiled counterparts. While the reference MATLAB implementation does include a built-in compiler, this implementation is not open sourced and little is known abouts its internal workings. TheMcVMproject has focused on the design and implementation of an optimizing virtual machine for a subset of the MATLAB programming language.

    Authors
    Maxime Chevalier-Boisvert   
    Date
    August 2009
    Published
    Master's Thesis, McGill University, Montreal, Canada
  15. The Metalexer Lexer Specification Language
    Abstract

    Compiler toolkits make it possible to rapidly develop compilers and translators for new programming languages. Recently, toolkit writers have focused on supporting extensible languages and systems that mix the syntaxes of multiple programming languages. However, this work has not been extended down to the lexical analysis level. As a result, users of these toolkits have to rely on ad-hoc solutions when they extend or mix syntaxes. This thesis presents MetaLexer, a new lexical specification language that remedies this deficiency.

    Authors
    Andrew Michael Casey   
    Date
    June 2009
    Published
    Master's Thesis, McGill University, Montreal, Canada

Technical Reports

  1. Sparse matrices on the web -- Characterizing the performance and optimal format selection of sparse matrix-vector multiplication in JavaScript
    Abstract

    JavaScript is the most widely used language for web programming, and now increasingly becoming popular for high performance computing, data-intensive applications, and deep learning. Sparse matrix-vector multiplication (SpMV) is an important kernel that is considered critical for the perfor- mance of those applications. In SpMV, the optimal selection of storage format is one of the key aspects of developing effective applications. This paper describes the distinctive nature of the performance and choice of optimal sparse matrix storage format for sequential SpMV in JavaScript as compared to native languages like C. Based on exhaustive experiments with 2000 real-life sparse matrices, we explored three main research questions. First, we examined the difference in performance between native C and JavaScript for the two major browsers, Firefox and Chrome. We observed that the best performing browser demonstrated a slowdown of only 1.2x to 3.9x, depending on the choice of sparse storage format. Second, we explored the performance of single-precision versus double-precision SpMV. In contrast to C, in JavaScript, we found that double-precision is more efficient than single-precision. Finally, we examined the choice of optimal storage format. To do this in a rigorous manner we introduced the notion of x%-affinity which allows us to identify those formats that are at least x% better than all other formats. Somewhat surprisingly, the best format choices are very different for C as compared to JavaScript, and even quite different between the two browsers.

    Authors
    Prabhjot Sandhu    David Herrera    Laurie Hendren   
    Date
    January 2018
    Published
    McGill University, Montreal, Canada
  2. HorseIR: Fusing Array Programming and Database Query Processing
    Abstract

    While traditional relational database management systems (RDBMS) seem to be a natural choice for data storage and analysis in the area of Data Science, current systems still fall short in two aspects. First, with increasingly cheap main memory, many workloads now fit into main memory while RDBMS have been optimized for I/O. Second, while current systems support advanced analytics beyond SQL queries through user-defined functions (UDF) written in procedural languages, integration into the SQL engine mostly follows a black-box approach limiting the opportunities for a holistic optimization. In this paper, we propose HorseIR, an array-based intermediate representation that allows for a unified representation of UDFs and SQL execution plans optimized with traditional RDBMS optimization techniques. HorseIR has a high-level design, supports rich types and data structures, including homogeneous vectors and heterogeneous lists. We identify suitable optimizations for generating efficient code from HorseIR, taking memory and CPU aspects into account. We compare HorseIR with the MonetDB RDBMS, by testing both standard SQL queries and queries with UDFs, and show how our holistic approach and compiler optimizations benefit the runtime of complex queries.

    Authors
    Hanfeng Chen    Joseph Vinish D'silva    Hongji Chen    Bettina Kemme    Laurie Hendren   
    Date
    January 2018
    Published
    McGill University, Montreal, Canada
  3. WebAssembly and JavaScript Challenge: Numerical program performance using modern browser technologies and devices
    Abstract

    Recent advances in execution environments for JavaScript and WebAssembly that run on a broad range of devices, from workstations to IoT devices, provides new opportunities for portable and web-based numerical computing. The aim of this paper is to evaluate the current state of the art through a comprehensive experiment using the Ostrich benchmark suite, a collection of numerical programs representing the numerical dwarf categories. Five research questions evaluate the improvement of JavaScript-based browser engines, the relative performance of JavaScript and WebAssembly, the relative performance of portable versus vendor-specific browsers, the relative performance of server-side versus client-side JavaScript/WebAssembly, and an overall comparison to find the best performing browser/language and the best performing device.

    Authors
    David Herrera    Hanfeng Chen    Erick Lavoie    Laurie Hendren   
    Date
    January 2018
    Published
    McGill University, Montreal, Canada
  4. A Formalization for Specifying and Implementing Correct Pull-Stream Modules
    Abstract

    Pull-stream is a JavaScript demand-driven functional design pattern based on callback functions that enables the creation and easy composition of independent modules that are used to create streaming applications. It is used in popular open source projects and the community around it has created over a hundred compatible modules. While the description of the pull-stream design pattern may seem simple, it does exhibit complicated termination cases. Despite the popularity and large uptake of the pull-stream design pattern, there was no existing formal specification that could help programmers reason about the correctness of their implementations. Thus, the main contribution of this paper is to provide a formalization for specifying and implementing correct pull-stream modules based on the following: (1) we show the pull-stream design pattern is a form of declarative concurrent programming; (2) we present an event-based protocol language that supports our formalization, independently of JavaScript; (3) we provide the first precise and explicit definition of the expected sequences of events that happen at the interface of two modules, which we call the pull-stream protocol; (4) we specify reference modules that exhibit the full range of behaviors of the pull-stream protocol; (5) we validate our definitions against the community expectations by testing the existing core pull-stream modules against them and identify unspecified behaviors in existing modules. Our approach helps to better understand the pull-stream protocol, to ensure interoperability of community modules, and to concisely and precisely specify new pull-stream abstractions in papers and documentation.

    Authors
    Erick Lavoie    Laurie Hendren   
    Date
    January 2018
    Published
    McGill University, Montreal, Canada
  5. Halophile: Comparing PNacl to Other Web Technologies
    Abstract

    Most modern web applications are written in JavaScript. However, the demand for web applications that require more numerically-intensive calculations, such as 3D gaming or photo-editing, has increased. This has also increased the demand for code that runs near native speeds. PNaCl is a toolchain that allows native C/C++ code to be run in the browser. This paper provides a comparison of the performance of PNaCl to native code and JavaScript. Using a benchmark suite that covers a representative set of numerical computations, it is shown on average, that the performance PNaCl is within 9% of native C code.

    Authors
    Lei Lopez   
    Date
    April 2015
    Published
    McGill University, Montreal, Canada
  6. McTutorial: A Structured Approach to Teaching MATLAB
    Abstract

    Learning how to program has increasingly become a more important skill for non-programmers in the tech industry or researchers outside of computer science. Newer programming languages such as MATLAB have grown into industrial strength languages, and many industries and academic fields outside of math and computer science have found uses for it. Thus, it is essential for many more people to learn MATLAB. However, many people often learn it without fundamental knowledge in programming concepts that are rooted in computer science. McTutorial aims to fill this gap.

    Authors
    Lei Lopez   
    Date
    August 2014
    Published
    McGill University, Montreal, Canada
  7. Using JavaScript and WebCL for Numerical Computations: A Comparative Study of Native and Web Technologies
    Abstract

    From its modest beginnings as a tool to validate forms, JavaScript is now an industrial-strength language used to power online applications such as spreadsheets, IDEs, image editors and even 3D games. Since all modern web browsers support JavaScript, it provides a medium that is both easy to distribute for developers and easy to access for users. This paper provides empirical data to answer the question: Is JavaScript suitable for numerical computations? By measuring and comparing the runtime performance of benchmarks representative of a wide variety of scientific applications, we show that for sequential JavaScript is within a factor of 2 of native code. Parallel code using WebCL shows speed improvements of up to 2.28 over JavaScript for the majority of the benchmarks.

    Authors
    Faiz Khan    Vincent Foley-Bourgon    Sujay Kathrotia    Erick Lavoie    Laurie Hendren   
    Date
    June 2014
    Published
    McGill University, Montreal, Canada
  8. MiX10: Compiling MATLAB for High Performance Computing via X10
    Abstract

    Matlab is a popular dynamic array-based language commonly used by students, scientists and engineers who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in Matlab, their computations are often very compute- intensive and are better suited for emerging high performance computing systems. This paper reports on MiX10, a source-to-source compiler that automatically translates Matlab programs to X10, a language designed for “Performance and Productivity at Scale”; thus, helping scientific programmers make better use of high performance computing systems. There is a large semantic gap between the array-based dynamically-typed nature of Matlab and the object-oriented, statically-typed, and high-level array abstractions of X10. This paper addresses the major challenges that must be overcome to produce sequential X10 code that is competitive with state-of-the-art static compilers for Matlab which target more conventional imperative languages such as C and Fortran. Given that efficient basis, the paper then provides a translation for the Matlab parfor construct that leverages the powerful concurrency constructs in X10. The MiX10 compiler has been implemented using the McLab compiler tools, is open source, and is available both for compiler researchers and end-user Matlab programmers. We have used the implementation to perform many empirical measurements on a set of 17 Matlab benchmarks. We show that our best MiX10-generated code is significantly faster than the de facto Mathworks’ Matlab system, and that our results are competitive with state-of-the-art static compilers that target C and Fortran. We also show the importance of finding the correct approach to representing the arrays in X10, and the necessity of an IntegerOkay analysis that determines which double variables can be safely represented as integers. Finally, we show that our X10-based handling of the Matlab parfor greatly outperforms the de facto Matlab implementation.

    Authors
    Vineet Kumar    Laurie Hendren   
    Date
    March 2014
    Published
    Sable Technical Report (2014-1), McGill University, Montreal, Canada
  9. Velociraptor: A compiler toolkit for numerical programs targeting CPUs and GPUs
    Abstract

    Developing compilers that allow scientific programmers to use multicores and GPUs is of increasing interest, however building such compilers requires considerable effort. We present Velociraptor: a portable compiler toolkit that can be used to easily build compilers for numerical programs targeting multicores and GPUs. Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for high-level parallel and accelerator constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor does the rest of the work by optimizing the VRIR code, and generating LLVM for CPUs and OpenCL for GPUs. Velociraptor also provides a smart runtime system to manage GPU resources and task dispatch. To demonstrate Velociraptor in action, we present two case studies: a proof-of-concept Python compiler targeting CPUs and GPUs, and a GPU extension for a MATLAB JIT.

    Authors
    Rahul Garg    Laurie Hendren   
    Date
    November 2013
    Published
    Sable Technical Report (2013-5), McGill University, Montreal, Canada
  10. Mc2For: A tool for automatically transforming MATLAB to Fortran 95
    Abstract

    MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB’s high-level syntax and dynamic types makes it ideal for prototyping, programmers often prefer using high-performance static programming languages such as Fortran for their final distributable code. Rather than requiring programmers to rewrite their code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to produce an equivalent Fortran program. There are several important challenges for automatically translating MATLAB to Fortran, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to Fortran constructs. In this paper, we introduce Mc2For, a tool which automatically translates MATLAB to Fortran. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as array shape and the range value information, which are used to generate variable declarations in the translated Fortran program. The second part is an extensible Fortran code generation framework to automatically transform MATLAB constructs to corresponding Fortran constructs. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated Fortran code for a collection of MATLAB benchmark programs.

    Authors
    Xu Li    Laurie Hendren   
    Date
    October 2013
    Published
    Sable Technical Report (2013-04), McGill University, Montreal, Canada
  11. MiX10:Compiling MATLAB for High Performance Computing via X10
    Abstract

    MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers, who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are better suited for the emerging high performance computing systems. Our solution is MiX10, a source to source compiler that automatically translates MATLAB programs to X10, a language designed for “Performance and Productivity at Scale”; thus, helping scientific programmers make better use of high performance computing systems. This paper addresses two major challenges in compiling MATLAB to X10: (1) efficiently transforming dynamically-typed MATLAB arrays to the best high-level, statically-typed array representation in X10; and (2) effectively exposing concurrency in MATLAB and generating efficient concurrent code in X10. We have implemented the techniques presented in this paper and provide an empirical study on a set of benchmarks, examining both the efficiency of the generated sequential X10 code and speedups for the concurrent versions.

    Authors
    Vineet Kumar    Laurie Hendren   
    Date
    October 2013
    Published
    Sable Technical Report (2013-03), McGill University, Montreal, Canada
  12. First steps to compiling MATLAB to X10
    Abstract

    MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers, who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are potentially very good applications for high-performance languages such as X10. To provide a bridge between MATLAB and X10, we are developing MiX10, a source-to-source compiler that translates MATLAB to X10. This paper provides an overview of the initial design of the MiX10 compiler, presents a template-based specialization approach to compiling the builtin MATLAB operators, and provides translation rules for the key sequential MATLAB constructs with a focus on those which are challenging to convert to semantically-equivalent X10. An initial core compiler has been implemented, and preliminary results are provided.

    Authors
    Vineet Kumar    Laurie Hendren   
    Date
    May 2013
    Published
    Sable Technical Report (2013-02), McGill University, Montreal, Canada
  13. Optimizing MATLAB feval with Dynamic Techniques
    Abstract

    MATLAB is a popular dynamically-typed array-based language. The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Programmers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this paper, we show that although feval provides an acceptable abstraction mechanism for these types of applications, there are significant performance overheads for function calls via feval, in both MATLAB interpreters and JITs. The paper then proposes, implements and compares two on-the-fly mechanisms for specialization of feval calls. The first approach specializes calls of functions with feval using a combination of runtime input argument types and values. The second approach uses on-stack replacement technology, as supported by McVM/McOSR. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    March 2013
    Published
    Sable Technical Report (2012-06-rev1), McGill University, Montreal, Canada
  14. A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems
    Abstract

    Superceded by newer report (2013-5, see above).

    Authors
    Rahul Garg    Laurie Hendren   
    Date
    November 2012
    Published
    ,
  15. A Modular Approach to On-Stack Replacement in LLVM
    Abstract

    In this report, we present a modular approach to implementing on-stack replacement that can be used by any system that targets the LLVM SSA intermediate representation, and we demonstrate the approach by using it to support dynamic inlining in McVM. McVM is a virtual machine for MATLAB which uses a LLVM-based JIT compiler. MATLAB is a popular dynamic language for scientific and engineering applications which typically manipulate large matrices and often contain long-running loops, and is thus an ideal target for dynamic JIT compilation and OSRs.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    April 2012
    Published
    Sable Technical Report (2012-01-rev1), McGill University, Montreal, Canada
  16. Refactoring MATLAB
    Abstract

    This report presents the important challenges of refactoring MATLAB along with automated techniques to handle a collection of refactorings for MATLAB functions and scripts including: function and script inlining, converting scripts to functions, and converting dynamic feval calls to static function calls. The refactorings have been implemented using the MATLAB compiler framework, and an evaluation is given on a large set of MATLAB benchmarks which demonstrates the effectiveness of our approach.

    Authors
    Soroush Radpour    Laurie Hendren   
    Date
    October 2011
    Published
    Sable Technical Report (2011-02), McGill University, Montreal, Canada
  17. McSAF: A Static Analysis Framework for MATLAB
    Abstract

    MATLAB is an extremely popular programming language used by scientists, engineers, researchers and students world-wide. Despite its popularity, it has received very little attention from compiler researchers. This report introduces McSAF, an open-source static analysis framework which is intended to enable more compiler research for MATLAB and extensions of MATLAB. The framework is based on an intermediate representation (IR) called McLAST, which has been designed to capture all the key features of MATLAB, while at the same time as being simple for program analysis. The paper describes both the IR and the procedure for creating the IR from the higher-level AST. The analysis framework itself provides visitor-based traversals including fixed-point-based traversals to support both forwards and backwards analyses. McSAF has been implemented as part of the McLAB project, and the framework has already been used for a variety of analyses, both for MATLAB and the AspectMATLAB extension.

    Authors
    Jesse Doherty    Laurie Hendren   
    Date
    December 2011
    Published
    Sable Technical Report (2011-01), McGill University, Montreal, Canada
  18. McFLAT: A Profile-based Framework for MATLAB Loop Analysis and Transformations
    Abstract

    This technical report presents a new framework, McFLAT, which uses profile-based training runs to determine likely loop bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.

    Authors
    Amina Aslam    Laurie Hendren   
    Date
    July 2010
    Published
    Sable Technical Report (2010-06), McGill University, Montreal, Canada
  19. Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT Compiler
    Abstract

    Several aspects of the MATLAB language such as dynamic loading and typing, safe updates, and copy semantics for arrays contribute to its appeal to the scientific communities, but at the same time provide many challenges to the compiler and virtual machine. One such problem, minimizing the number of copies and copy checks for Matlab programs has not received much attention. The classical approach to minimizing the number of copies (i.e., reference counting) does not work in a garbage-collected virtual machine. This technical report presents a staged static analysis approach that does not require reference counts, thus enabling a garbage-collected virtual machine.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    July 2010
    Published
    Sable Technical Report (2010-05), McGill University, Montreal, Canada
  20. AspectMatlab: An Aspect-Oriented Scientific Programming Language
    Abstract

    This technical report is an extended version of the AOSD 2010 paper above.

    Authors
    Toheed Aslam   
    Date
    January 2010
    Published
    Sable Technical Report (2009-03), McGill University, Montreal, Canada