McLab Publications


Lectures and Tutorials

  1. VeloCty Lecture At CASCON
    Description

    We presented an overview of VeloCty at the 13th Compiler Driven Performance workshop. The CDP workshop was held during the IBM Center for Advanced Studies Conference (CASCON). The workshop featured talks on topics such as innovative analysis, languages, compilers and optimisation techniques for parallel environments etc. More information can be found here.

    Authors
    Sameer Jagdale   
    Date
    November 2014
    Presented
    CASCON 2014, Markham, Canada
  2. McNumJS Lecture At CASCON
    Description

    We presented the modules and features of the McNumJS library and demonstrated the performance of the library compared to Ostrich benchmark suite at CASCON Compiler Driven Performance workshop.

    Authors
    Sujay Kathrotia   
    Date
    November 2014
    Presented
    CASCON 2014, Markham, Canada
  3. MiX10 Lecture At CASCON
    Description

    We had a chance to talk about MiX10 at the 12th Compiler Driven Performance workshop at CASCON 2013. Get the slides here.

    Authors
    Vineet Kumar   
    Date
    November 2013
    Presented
    CASCON 2013, Markham, Canada
  4. Leverhulme Lecture Series
    Description

    During the academic year 2010-2011, Professor Hendren was on sabbatical leave at the University of Oxford, during which time she held a Leverhulme Visiting Professor position. As part of the position she presented a series of three Leverhulme Lectures.

    Authors
    Laurie Hendren   
    Date
    June - July 2011
    Presented
    University of Oxford, Oxford, England
  5. Introduction to McLab, a compiler and VM framework for MATLAB
    Description

    The purpose of this tutorial is to introduce McLab, a new publicly-available toolkit for analyzing and executing MATLAB programs. The tutorial starts by introducing the MATLAB language and the particular challenges the language presents for compiler developers. It then presents the McLab front-end, which supports both standard MATLAB, prebuilt extensions such as AspectMatlab, and the ability to create new language extensions. The middle part of the tutorial focuses on the IRs and analysis framework along with some example analyses. The final part of the tutorial presents the back-ends, with a particular emphasis on the McVM JIT.

    Authors
    Laurie Hendren    Rahul Garg    Nurudeen Lameed   
    Date
    June 2011
    Presented
    PLDI, San Jose, California

Papers

  1. AspectMatlab++: Annotations, Types, and Aspects for Scientists
    Abstract

    In this paper we present extensions to an aspect oriented compiler developed for MATLAB . These extensions are intended to support important functionality for scientists, and include pattern match- ing on annotations, and types of variables, as well as new manners of exposing context. We provide use-cases of these features in the form of several general-use aspects which focus on solving issues that arise from use of dynamically-typed languages. We also de- tail performance enhancements to the AspectMatlab compiler which result in an order of magnitude in performance gains.

    Authors
    Andrew Bodzay    Laurie Hendren   
    Date
    March 2015
    Published
    Modularity 2015, Fort Collins, CO, USA
  2. Mc2For: A tool for automatically translating MATLAB to FORTRAN 95
    Abstract

    MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB’s high-level syntax and dynamic types make it ideal for prototyping, programmers often prefer using high-performance static languages such as FORTRAN for their final distributable code. Rather than rewriting the code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to an equivalent FORTRAN program. There are several important challenges for automatically translating MATLAB to FORTRAN, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to equivalent FORTRAN constructs.

    In this paper, we introduce Mc2FOR, a tool which automatically translates MATLAB to FORTRAN. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shape of arrays and the range of scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated FORTRAN program. The second part is an extensible FORTRAN code generation framework automatically transforming MATLAB constructs to FORTRAN. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated FORTRAN code on a collection of MATLAB benchmarks.

    Authors
    Xu Li    Laurie Hendren   
    Date
    February 3rd, 2014
    Published
    WCRE 2014, Antwerp, Belgium
  3. Optimizing MATLAB Feval with Dynamic Techniques
    Abstract

    MATLAB is a popular dynamic array-based language used by engineers, scientists and students worldwide. The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Programmers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this paper, we show that there are significant performance overheads for function calls via feval, in both MATLAB interpreters and JITs. The paper then proposes, implements and compares two on-the-fly mechanisms for specialization of feval calls. The first approach uses on-stack replacement technology, as supported by McVM/McOSR. The second approach specializes calls of functions with feval using a combination of runtime input argument types and values. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    October 2013
    Published
    DLS 2013, Indianapolis, USA
  4. Refactoring MATLAB
    Abstract

    This paper presents the important challenges of refactoring MATLAB along with automated techniques to handle a collection of refactorings for MATLAB functions and scripts including: converting scripts to functions, extracting functions, and converting dynamic function calls to static ones. The refactorings have been implemented using the McLAB compiler framework, and an evaluation is given on a large set of MATLAB benchmarks which demonstrates the effectiveness of our approach.

    Authors
    Soroush Radpour    Laurie Hendren   
    Date
    March 2013
    Published
    CC 2013, Rome, Italy
  5. A Modular Approach to On-Stack Replacement in LLVM
    Abstract

    This paper presents a modular approach to implementing OSR for the LLVM compiler infrastructure. This is an important step forward because LLVM is gaining popular support, and adding the OSR capability allows compiler developers to develop new dynamic techniques. In particular, it will enable more sophisticated LLVM-based JIT compiler approaches. Indeed, other compiler/VM developers can use our approach because it is a clean modular addition to the standard LLVM distribution. Further, our approach is defined completely at the LLVM-IR level and thus does not require any modifications to the target code generation.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    March 2013
    Published
    VEE 2013, Houston, Texas, USA
  6. Taming MATLAB
    Abstract

    MATLAB is a dynamic scientific language used by scientists, engineers and students worldwide. Although MATLAB is very suitable for rapid prototyping and development, MATLAB users often want to convert their final MATLAB programs to a static language such as FORTRAN. This paper presents an extensible object-oriented toolkit for supporting the generation of static programs from dynamic MATLAB programs. Our open source toolkit, called the MATLAB Tamer, identifies a large tame subset of MATLAB, supports the generation of a specialized Tame IR for that subset, provides a principled approach to handling the large number of builtin MATLAB functions, and supports an extensible interprocedural value analysis for estimating MATLAB types and call graphs.

    Authors
    Anton Dubrau    Laurie Hendren   
    Date
    October 2012
    Published
    OOPSLA 2012, Tucson, Arizona, USA
  7. Kind Analysis for MATLAB
    Abstract

    A fundamental problem in MATLAB is determining the kind of an identifier. Does an identifier refer to a variable, a named function or a prefix? Although this is a trivial problem for most programming languages, it was not clear how to do this properly in MATLAB. Furthermore, there was no simple explanation of kind analysis suitable for MATLAB programmers, nor a publicly-available implementation suitable for compiler researchers. This paper explains the required background of MATLAB, clarifies the kind assignment program, and proposes some general guidelines for developing good kind analyses. Based on these foundations we present our design and implementation of a variety of kind analyses, including an approach that matches the intended behaviour of modern MATLAB 7 and two potentially better alternatives.

    Authors
    Jesse Doherty    Soroush Radpour    Laurie Hendren   
    Date
    October 2012
    Published
    OOPSLA 2012, Tucson, Arizona, USA
  8. MetaLexer: A Modular Lexical Specification Language
    Abstract

    Compiler toolkits make it possible to rapidly develop compilers and translators for new programming languages. Although there exist elegant toolkits for modular and extensible parsers, compiler developers must often resort to ad-hoc solutions when extending or composing lexers. This paper presents MetaLexer, a new modular lexical specification language and associated tool.

    Slides are available here.

    Authors
    Andrew Casey    Laurie Hendren   
    Date
    October 2011
    Published
    AOSD 2011, Pernambuco, Brazil
  9. Typing Aspects for MATLAB
    Abstract

    This paper introduces the idea of adding typing aspects to MATLAB programs. A typing aspect can be used to: (1) capture the run-time types of variables, and (2) to check run-time types against either a declared type or against a previously captured run-time type. Typings aspects can be deployed at three different levels,they can be used: (1) solely as documentation, (2) to log type errors or (3) to catch type errors at run-time.

    Slides are available here.

    Authors
    Laurie Hendren   
    Date
    October 2011
    Published
    DSAL 2011, Pernambuco, Brazi
  10. Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT Compiler
    Abstract

    Several aspects of the MATLAB language such as dynamic loading and typing, safe updates, and copy semantics for arrays contribute to its appeal to the scientific communities, but at the same time provide many challenges to the compiler and virtual machine. One such problem, minimizing the number of copies and copy checks for Matlab programs has not received much attention. The classical approach to minimizing the number of copies (i.e., reference counting) does not work in a garbage-collected virtual machine. This paper presents a staged static analysis approach that does not require reference counts, thus enabling a garbage-collected virtual machine.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    March 2011
    Published
    Compiler Construction (CC) 2011, Saarbrüken, Germany
  11. McFLAT: A Profile-based Framework for MATLAB Loop Analysis and Transformations
    Abstract

    This paper presents a new framework,McFLAT,which uses profile-based training runs to determine likely loop-bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.

    Authors
    Amina Aslam    Laurie Hendren   
    Date
    October 2010
    Published
    LCPC 2010, Houston, Texas, USA
  12. McLab: An extensible compiler toolkit for MATLAB and related languages
    Abstract

    MATLAB is a popular language for scientific computation. Effectively compiling MATLAB presents many challenges due to the dynamic nature of the language. We present McLab, an extensible compiler toolkit for the MATLAB and related languages. McLab aims to provide high performance execution of MATLAB on modern architectures while bringing modern programming concepts such as aspect-oriented programming and other extensions to MATLAB. McLab consists of several components. The first component is an extensible frontend to parse and analyze MATLAB as well as extensions to MATLAB. The second component, called McFor, is a compiler to translate a static subset of MATLAB to FORTRAN. The third component, McVM, is a virtual machine including a JIT compiler to execute MATLAB code. Finally we also provide language extensions such as AspectMatlab. We present the current state of the implementation of McLab and describe ongoing work and future directions of the project.

    Authors
    Andrew Casey    Jun Li    Jesse Doherty    Maxime Chevalier-Boisvert    Toheed Aslam    Anton Dubrau    Nurudeen Lameed    Amina Aslam    Rahul Garg    Soroush Radpour    Olivier Savary Belanger    Laurie Hendren    Clark Verbrugge   
    Date
    May 2010
    Published
    C3S2E '10, Montreal, Canada
  13. Optimizing Matlab through Just-In-Time Specialization
    Abstract

    Scientists are increasingly using dynamic programming languages like Matlab for prototyping and implementation. Effectively compiling Matlab raises many challenges due to the dynamic and complex nature of Matlab types. This paper presents a new JIT-based approach which specializes and optimizes functions on-the-fly based on the current types of function arguments.

    Authors
    Maxime Chevalier-Boisvert    Laurie Hendren    Clark Verbrugge   
    Date
    March 2010
    Published
    Compiler Construction (CC) 2010, Paphos, Cyprus
  14. AspectMatlab: An Aspect-Oriented Scientific Programming Language
    Abstract

    This paper introduces a new aspect-oriented programming language, AspectMatlab. AspectMatlab introduces key aspect-oriented features in a way that is both accessible to scientists and where the aspect-oriented features concentrate on array accesses and loops, the core computation elements in scientific programs. The paper reports on the language design of AspectMatlab, the amc compiler implementation and related optimizations, and also provides an overview of use cases that are specific to scientific programming.

    Authors
    Toheed Aslam    Jesse Doherty    Anton Dubrau    Laurie Hendren   
    Date
    March 2010
    Published
    AOSD 2010, Rennes and Saint-Malo , France

Theses

  1. McIDE: A MATLAB IDE powered by dynamic analysis
    Abstract

    MATLAB is a popular dynamic scientific programming language. The typical MATLAB user is not a software professional; it is chiefly used among scientists, engineers, and students, and enjoys wide adoption in large part because of its high level syntax and wide array of libraries for many problem domains in the sciences. The inexperience of many MATLAB programmers, coupled with the ill-specified and often counterintuitive semantics of the language, leads to MATLAB code in the wild that is difficult to understand and maintain. In this thesis, we present McIDE, an integrated development environment for MATLAB programming. McIDE provides tools to help MATLAB programmers write better programs, among them automated refactorings and code navigation features like “jump to definition”. It is also opinionated about MATLAB code, and tries to recognize common anti-patterns and either warn about or eliminate them. McIDE is built up of several largely independent components wired together by a thin graphical interface. Some of these components are pre-existing, such as a MATLAB parser provided by the McLab compiler toolkit, and others are contributions of this thesis, such as a dynamic call graph collection mechanism for MATLAB code, and a layout-preserving code transformation engine. A theme of McIDE’s implementation is reliance on runtime information, since purely static information is often insufficient if we wish to support the development of arbitrary MATLAB code, including its more dynamic features.

    Authors
    Ismail Badawi   
    Date
    March 2016
    Published
    Master's Thesis, McGill University, Montreal, Canada
  2. VeloCty: A Static Optimising Compiler for MATLAB and NumPy
    Abstract

    High-level scientific languages such as MATLAB and Python’s NumPy library are gaining popularity among scientists and mathematicians. These languages provide many features such as dynamic typing, high-level scientific functions etc. which allow easy prototyping. However these features also inhibit performance of the code. We present VeloCty, an optimizing static compiler for MATLAB and Python as a solution to the problem of enhancing performance of programs written in these languages. In most programs, a large portion of the time is spent executing a small part of the code. Moreover, these sections can often be compiled ahead of time and improved performance can be achieved by optimizing only these `hot’ sections of the code. VeloCty takes as input functions written in MATLAB and Python specified by the user and generates an equivalent C++ version. VeloCty also generates glue code to interface with MATLAB and Python. The generated code can then be compiled and packaged as a shared library that can be linked to any program written in MATLAB and Python. We also implemented optimisations to eliminate array bounds checks, reuse previously allocated memory during array operations and support parallel execution using OpenMP. VeloCty uses the Velociraptor toolkit. We implemented a C++ backend for the Velociraptor intermediate representation, VRIR, and language-specific runtimes for MATLAB and Python. We have also implemented a MATLAB VRIR generator using the mclab toolkit. VeloCty was evaluated using 17 MATLAB benchmarks and 9 Python benchmarks. The MATLAB benchmark versions compiled using VeloCty with all optimisations enabled were between 1.3 to 458 times faster than the MathWorks’ MATLAB2014b interpreter and JIT compiler. Similarly, Python benchmark versions were between 44.11 and 1681 times faster than the CPython interpreter.

    Authors
    Sameer Jagdale   
    Date
    April 2015
    Published
    Master's Thesis, McGill University, Montreal, Canada
  3. McNumJS: A JavaScript Library for Numerical Computations
    Abstract

    There has been a huge development in the web community recently, with an increasing focus on the performance of JavaScript. The development of state-of-the-art JavaScript engines and JavaScript technologies has improved the performance of JavaScript considerably and made it competitive with other dynamic languages. The major advantage of JavaScript applications is that they can run on any device that supports web browsers and distribution of these applications is very easy. This thesis reports on McNumJS, an easy-to-use and high-performance JavaScript library for numerical computations. This library is helpful to JavaScript developers for developing numerical applications and compiler writers who want to compile scientific languages like MATLAB or R to JavaScript.

    There has been a surge of technologies like typed arrays, web workers and asm.js, developed to improve the performance of JavaScript. We analyze these technologies and report their suitability for numerical applications. We have also compiled a detailed study on asm.js and performed different experiments to find the parts of asm.js that we can use in regular development of JavaScript applications.

    There are two main design goals behind the development of McNumJS: i) making it easy-to-use, and ii) provide high-performance. We achieved the easy-to-use goal by making an API similar to the NumPy, a popular python library for scientific computing. To make McNumJS high-performance, we used JavaScript typed arrays and type coercing rules defined by asm.js. We report the speedups we get by using McNumJS compared to other JavaScript libraries and JavaScript with regular arrays. We report the performance difference between McNumJS and native C. These experiments show that the performance of McNumJS library is competitive with native C and outperforms other JavaScript libraries for numerical computations.

    Authors
    Sujay Kathrotia   
    Date
    April 2015
    Published
    Master's Thesis, McGill University, Montreal, Canada
  4. AspectMatlab++: Developing an Aspect-Oriented Language for Scientists
    Abstract

    MATLAB is a popular dynamic array-based language commonly used within the scientific community. MATLAB ’s widespread use can be attributed to its large library of built-in functions, and its high-level syntax, which requires no type declarations, making it ideal for fast prototyping. This thesis presents extensions to AspectMatlab, an aspect oriented compiler developed for MATLAB. AspectMatlab was created with the intent of bringing aspect oriented programming to MATLAB, and targeted features such as array accesses and loops, which are the core computations in scientific programs. This thesis presents AspectMatlab++. AspectMatlab++ extends AspectMatlab by focusing on a different set of challenges, seeking to make aspect-oriented programming easier to use and providing mechanisms to handle a variety of the problems that occur in a dynamically typed language. To this end, we introduce pattern matching on annotations and types of variables, as well as new manners of exposing context. We also provide several use-cases of these features in the form of general-use aspects which focus on solving issues that arise from use of dynamically-typed languages. These include aspects which perform type and unit checking, profiling aspects, as well as as- pects which perform basic loop optimizations. This thesis also details several performance enhancements to the AspectMatlab compiler, which result in a speed improvement of about 10 times.

    Authors
    Andrew Bodzay   
    Date
    December 2015
    Published
    Master's Thesis, McGill University, Montreal, Canada
  5. Mc2For: A MATLAB to Fortran 95 Compiler
    Abstract

    MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB’s high-level syntax and dynamic types make it ideal for fast prototyping, programmers often prefer using high-performance static languages such as FORTRAN for their final distribution. Rather than rewriting the code by hand, our solution is to provide a source-to-source compiler that translates the original MATLAB program to an equivalent MATLAB program.

    In this thesis, we introduce MC2FOR, a source-to-source compiler which transforms MATLAB to FORTRAN and handles several important challenges during the transformation, such as efficiently estimating the static type characteristics of all the variables in a given MATLAB program, mapping numerous MATLAB built-in functions to FORTRAN, and correctly supporting some MATLAB dynamic features in the generated FORTRAN code.

    This compiler consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shapes of the arrays and the ranges of the scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated FORTRAN program. The second part is an extensible FORTRAN code generation framework automatically transforming MATLAB constructs to equivalent FORTRAN constructs.

    This work has been implemented within the McLab framework, and we evaluated the performance of the Mc2For compiler on a collection of 20 MATLAB benchmarks. For most of the benchmarks, the generated FORTRAN program runs 1.2 to 337 times faster than the original MATLAB program, and in terms of physical lines of code, typically grows only by a factor of around 2. These experimental results show that the code generated by Mc2For performs better on average, at the cost of only a modest increase in code size.

    Authors
    Xu Li   
    Date
    April 2014
    Published
    Master's Thesis, McGill University, Montreal, Canada
  6. MiX10: Compiling MATLAB to X10 for high performance
    Abstract

    MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are better suited for emerging high performance computing systems. This thesis reports on MIX10, a source-to-source compiler that automatically translates MATLAB programs to X10, a language designed for “Performance and Productivity at Scale”; thus, helping scientific programmers make better use of high performance computing systems. There is a large semantic gap between the array-based dynamically-typed nature of MATLAB and the object-oriented, statically-typed, and high-level array abstractions of X10. This thesis addresses the major challenges that must be overcome to produce sequential X10 code that is competitive with state-of-the-art static compilers for MATLAB which target more conventional imperative languages such as C and Fortran. Given that efficient basis, the thesis then provides a translation for the MATLAB parfor construct that leverages the powerful concurrency constructs in X10. The MIX10 compiler has been implemented using the McLab compiler tools, is open source, and is available both for compiler researchers and end-user MATLAB programmers. We have used the implementation to perform many empirical measurements on a set of 17 MATLAB benchmarks. We show that our best MIX10-generated code is significantly faster than the de facto Mathworks’ MATLAB system, and that our results are competitive with state-of-the-art static compilers that target C and Fortran. We also show the importance of finding the correct approach to representing the arrays in X10, and the necessity of an IntegerOkay analysis that determines which double variables can be safely represented as integers. Finally, we show that our X10-based handling of the MATLAB parfor greatly outperforms the de facto MATLAB implementation.

    Authors
    Vineet Kumar   
    Date
    April 2014
    Published
    Master's Thesis, McGill University, Montreal, Canada
  7. DYNAMIC COMPILER OPTIMIZATION TECHNIQUES FOR MATLAB
    Abstract

    MATLAB has gained widespread acceptance among engineers and scientists. Several aspects of the language such as dynamic loading and typing, safe updates, copy semantics for arrays, and support for higher-order functions contribute to its appeal, but at the same time provide many challenges to the compiler and virtual machine. MATLAB is a dynamic language. Traditional implementations of the language use interpreters and have been found to be too slow for large computations. More recently, researchers and software developers have been developing JIT compilers for MATLAB and other dynamic languages. This thesis is about the development of new compiler analyses and transformations for a MATLAB JIT compiler, McJIT, which is based on the LLVM JIT compiler toolkit. The new contributions include a collection of novel analyses for optimizing copying of arrays, which are performed when a function is first compiled. We designed and imple- mented four analyses to support an efficient implementation of array copy semantics in a MATLAB JIT compiler. Experimental results show that copy optimization is essential for performance improvement in a compiler for the MATLAB language.

    We also developed a variety of new dynamic analyses and code transformations for optimizing running code on-the-fly according to the current conditions of the runtime en- vironment. LLVM does not currently support on-the-fly code transformation. So, we first developed a new on-stack replacement approach for LLVM. This capability allows the run- time stack to be modified during the execution of a function, thus enabling a continuation of the execution at a higher optimization level. We then used the on-stack replacement implementation to support selective inlining of function calls in long-running loops. Our experimental results show that function calls in long-running loops can result in high run- time overhead, and that selective dynamic inlining can be used to drastically reduce this overhead.

    The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Pro- grammers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this thesis, we show that although feval provides an acceptable abstraction mechanism for these types of applications, there are significant performance overheads for function calls via feval, in both MATLAB inter- preters and JITs. The thesis then proposes, implements and compares two on-the-fly mech- anisms for specialization of feval calls. The first approach uses our on-stack replacement technology. The second approach specializes calls of functions with feval using a combi- nation of runtime input argument types and values. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.

    The implementation of all the analyses and code transformations presented in this thesis has been done within the McLab virtual machine, McVM, and is available to the public as open source software.

    Authors
    Nurudeen Lameed   
    Date
    April 2013
    Published
    Ph.D Thesis, McGill University, Montreal, Canada
  8. Understanding and Refactoring the MATLAB language
    Abstract

    MATLAB is a very popular dynamic “scripting” language for numerical computations used by scientists, engineers and students world-wide. MATLAB programs are often developed incrementally using a mixture of MATLAB scripts and functions and frequently build upon existing code which may use outdated features. This results in programs that could benefit from refactoring, especially if the code will be reused and/or distributed. Despite the need for refactoring there appear to be no MATLAB refactoring tools available. Correct refactoring of MATLAB is quite challenging because of its non-standard rules for binding identifiers. Even simple refactorings are non-trivial. Compiler writers and software engineers are generally not familiar with MATLAB and how it is used so the problem has been left untouched so far. This thesis has two main contributions. The first is McBench, a tool that helps compiler writers understand the language better. In order to have a systematic approach to the problem, we developed this tool to give us some insight about how programmers use MATLAB. The second contribution is a suite of semantic-preserving refactoring for MATLAB functions and scripts including: function and script inlining, converting scripts to functions, extracting new functions, and converting dynamic feval calls to static function calls. These refactorings have been implemented using the McLAB compiler framework, and an evaluation is given on a large set of MATLAB programs which demonstrates the effectiveness of our approach.

    Authors
    Soroush Radpour   
    Date
    August 2012
    Published
    Master's Thesis, McGill University, Montreal, Canada
  9. Taming MATLAB
    Abstract

    This thesis presents an extensible object-oriented toolkit to help facilitate the generation of static programs from dynamic MATLAB programs. Our open source toolkit, called the MATLAB Tamer, targets a large subset of MATLAB. Given information about the entry point of the program, the MATLAB Tamer builds a complete callgraph, transforms every function into a reduced intermediate representation, and provides typing information to aid the generation of static code.

    Authors
    Anton Dubrau   
    Date
    April 2012
    Published
    Master's Thesis, McGill University, Montreal, Canada
  10. MCSAF: AN EXTENSIBLE STATIC ANALYSIS FRAMEWORK FOR THE MATLAB LANGUAGE
    Abstract

    MATLAB is a popular language for scientific and numerical programming. Despite its popularity, there are few active projects providing open tools for MATLAB related compiler research. This thesis provides the McLAB Static Analysis Framework, McSAF, the goal of which is to simplify the development of new compiler tools for MATLAB

    Authors
    Jesse Doherty   
    Date
    August 2011
    Published
    Master's Thesis, McGill University, Montreal, Canada
  11. McFLAT: A Profile-based Framework for MATLAB Loop Analysis and Transformations
    Abstract

    This thesis presents a new framework, McFLAT, which uses profile-based training runs to determine likely loop-bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.

    Authors
    Amina Aslam   
    Date
    August 2010
    Published
    Master's Thesis, McGill University, Montreal, Canada
  12. AspectMatlab: An Aspect-Oriented Scientific Programming Language
    Abstract

    This is the first thesis introducing AspectMatlab.

    Authors
    Toheed Aslam   
    Date
    February 2010
    Published
    Master's Thesis, McGill University, Montreal, Canada
  13. McFOR: A MATLAB to FORTRAN 95 Compiler
    Abstract

    The high-level array programming language MATLAB is widely used for prototyping algorithms and applications of scientific computations. However, its dynamically typed nature, which means that MATLAB programs are usually executed via an interpreter, leads to poor performance. An alternative approach would be converting MATLAB programs to equivalent Fortran 95 programs. The resulting programs could be compiled using existing high-performance Fortran compilers and thus could provide better performance. This thesis introduces McFOR, a MATLAB to FORTRAN 95 Compiler.

    Authors
    Jun Li   
    Date
    August 2009
    Published
    Master's Thesis, McGill University, Montreal, Canada
  14. McVM: an Optimizing Virtual Machine for the MATLAB Programming Language
    Abstract

    In recent years, there has been an increase in the popularity of dynamic languages such as Python, Ruby, PHP, JavaScript and MATLAB. Programmers appreciate the productivity gains and ease of use associated with such languages. However, most of them still run in virtual machines which provide no Just-In-Time (JIT) compilation support, and thus perform relatively poorly when compared to their statically compiled counterparts. While the reference MATLAB implementation does include a built-in compiler, this implementation is not open sourced and little is known abouts its internal workings. TheMcVMproject has focused on the design and implementation of an optimizing virtual machine for a subset of the MATLAB programming language.

    Authors
    Maxime Chevalier-Boisvert   
    Date
    August 2009
    Published
    Master's Thesis, McGill University, Montreal, Canada
  15. The Metalexer Lexer Specification Language
    Abstract

    Compiler toolkits make it possible to rapidly develop compilers and translators for new programming languages. Recently, toolkit writers have focused on supporting extensible languages and systems that mix the syntaxes of multiple programming languages. However, this work has not been extended down to the lexical analysis level. As a result, users of these toolkits have to rely on ad-hoc solutions when they extend or mix syntaxes. This thesis presents MetaLexer, a new lexical specification language that remedies this deficiency.

    Authors
    Andrew Michael Casey   
    Date
    June 2009
    Published
    Master's Thesis, McGill University, Montreal, Canada

Technical Reports

  1. Halophile: Comparing PNacl to Other Web Technologies
    Abstract

    Most modern web applications are written in JavaScript. However, the demand for web applications that require more numerically-intensive calculations, such as 3D gaming or photo-editing, has increased. This has also increased the demand for code that runs near native speeds. PNaCl is a toolchain that allows native C/C++ code to be run in the browser. This paper provides a comparison of the performance of PNaCl to native code and JavaScript. Using a benchmark suite that covers a representative set of numerical computations, it is shown on average, that the performance PNaCl is within 9% of native C code.

    Authors
    Lei Lopez   
    Date
    April 2015
    Published
    McGill University, Montreal, Canada
  2. McTutorial: A Structured Approach to Teaching MATLAB
    Abstract

    Learning how to program has increasingly become a more important skill for non-programmers in the tech industry or researchers outside of computer science. Newer programming languages such as MATLAB have grown into industrial strength languages, and many industries and academic fields outside of math and computer science have found uses for it. Thus, it is essential for many more people to learn MATLAB. However, many people often learn it without fundamental knowledge in programming concepts that are rooted in computer science. McTutorial aims to fill this gap.

    Authors
    Lei Lopez   
    Date
    August 2014
    Published
    McGill University, Montreal, Canada
  3. Using JavaScript and WebCL for Numerical Computations: A Comparative Study of Native and Web Technologies
    Abstract

    From its modest beginnings as a tool to validate forms, JavaScript is now an industrial-strength language used to power online applications such as spreadsheets, IDEs, image editors and even 3D games. Since all modern web browsers support JavaScript, it provides a medium that is both easy to distribute for developers and easy to access for users. This paper provides empirical data to answer the question: Is JavaScript suitable for numerical computations? By measuring and comparing the runtime performance of benchmarks representative of a wide variety of scientific applications, we show that for sequential JavaScript is within a factor of 2 of native code. Parallel code using WebCL shows speed improvements of up to 2.28 over JavaScript for the majority of the benchmarks.

    Authors
    Faiz Khan    Vincent Foley-Bourgon    Sujay Kathrotia    Erick Lavoie    Laurie Hendren   
    Date
    June 2014
    Published
    McGill University, Montreal, Canada
  4. MiX10: Compiling MATLAB for High Performance Computing via X10
    Abstract

    Matlab is a popular dynamic array-based language commonly used by students, scientists and engineers who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in Matlab, their computations are often very compute- intensive and are better suited for emerging high performance computing systems. This paper reports on MiX10, a source-to-source compiler that automatically translates Matlab programs to X10, a language designed for “Performance and Productivity at Scale”; thus, helping scientific programmers make better use of high performance computing systems. There is a large semantic gap between the array-based dynamically-typed nature of Matlab and the object-oriented, statically-typed, and high-level array abstractions of X10. This paper addresses the major challenges that must be overcome to produce sequential X10 code that is competitive with state-of-the-art static compilers for Matlab which target more conventional imperative languages such as C and Fortran. Given that efficient basis, the paper then provides a translation for the Matlab parfor construct that leverages the powerful concurrency constructs in X10. The MiX10 compiler has been implemented using the McLab compiler tools, is open source, and is available both for compiler researchers and end-user Matlab programmers. We have used the implementation to perform many empirical measurements on a set of 17 Matlab benchmarks. We show that our best MiX10-generated code is significantly faster than the de facto Mathworks’ Matlab system, and that our results are competitive with state-of-the-art static compilers that target C and Fortran. We also show the importance of finding the correct approach to representing the arrays in X10, and the necessity of an IntegerOkay analysis that determines which double variables can be safely represented as integers. Finally, we show that our X10-based handling of the Matlab parfor greatly outperforms the de facto Matlab implementation.

    Authors
    Vineet Kumar    Laurie Hendren   
    Date
    March 2014
    Published
    Sable Technical Report (2014-1), McGill University, Montreal, Canada
  5. Velociraptor: A compiler toolkit for numerical programs targeting CPUs and GPUs
    Abstract

    Developing compilers that allow scientific programmers to use multicores and GPUs is of increasing interest, however building such compilers requires considerable effort. We present Velociraptor: a portable compiler toolkit that can be used to easily build compilers for numerical programs targeting multicores and GPUs. Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for high-level parallel and accelerator constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor does the rest of the work by optimizing the VRIR code, and generating LLVM for CPUs and OpenCL for GPUs. Velociraptor also provides a smart runtime system to manage GPU resources and task dispatch. To demonstrate Velociraptor in action, we present two case studies: a proof-of-concept Python compiler targeting CPUs and GPUs, and a GPU extension for a MATLAB JIT.

    Authors
    Rahul Garg    Laurie Hendren   
    Date
    November 2013
    Published
    Sable Technical Report (2013-5), McGill University, Montreal, Canada
  6. Mc2For: A tool for automatically transforming MATLAB to Fortran 95
    Abstract

    MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB’s high-level syntax and dynamic types makes it ideal for prototyping, programmers often prefer using high-performance static programming languages such as Fortran for their final distributable code. Rather than requiring programmers to rewrite their code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to produce an equivalent Fortran program. There are several important challenges for automatically translating MATLAB to Fortran, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to Fortran constructs. In this paper, we introduce Mc2For, a tool which automatically translates MATLAB to Fortran. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as array shape and the range value information, which are used to generate variable declarations in the translated Fortran program. The second part is an extensible Fortran code generation framework to automatically transform MATLAB constructs to corresponding Fortran constructs. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated Fortran code for a collection of MATLAB benchmark programs.

    Authors
    Xu Li    Laurie Hendren   
    Date
    October 2013
    Published
    Sable Technical Report (2013-04), McGill University, Montreal, Canada
  7. MiX10:Compiling MATLAB for High Performance Computing via X10
    Abstract

    MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers, who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are better suited for the emerging high performance computing systems. Our solution is MiX10, a source to source compiler that automatically translates MATLAB programs to X10, a language designed for “Performance and Productivity at Scale”; thus, helping scientific programmers make better use of high performance computing systems. This paper addresses two major challenges in compiling MATLAB to X10: (1) efficiently transforming dynamically-typed MATLAB arrays to the best high-level, statically-typed array representation in X10; and (2) effectively exposing concurrency in MATLAB and generating efficient concurrent code in X10. We have implemented the techniques presented in this paper and provide an empirical study on a set of benchmarks, examining both the efficiency of the generated sequential X10 code and speedups for the concurrent versions.

    Authors
    Vineet Kumar    Laurie Hendren   
    Date
    October 2013
    Published
    Sable Technical Report (2013-03), McGill University, Montreal, Canada
  8. First steps to compiling MATLAB to X10
    Abstract

    MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers, who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are potentially very good applications for high-performance languages such as X10. To provide a bridge between MATLAB and X10, we are developing MiX10, a source-to-source compiler that translates MATLAB to X10. This paper provides an overview of the initial design of the MiX10 compiler, presents a template-based specialization approach to compiling the builtin MATLAB operators, and provides translation rules for the key sequential MATLAB constructs with a focus on those which are challenging to convert to semantically-equivalent X10. An initial core compiler has been implemented, and preliminary results are provided.

    Authors
    Vineet Kumar    Laurie Hendren   
    Date
    May 2013
    Published
    Sable Technical Report (2013-02), McGill University, Montreal, Canada
  9. Optimizing MATLAB feval with Dynamic Techniques
    Abstract

    MATLAB is a popular dynamically-typed array-based language. The built-in function feval is an important MATLAB feature for certain classes of numerical programs and solvers which benefit from having functions as parameters. Programmers may pass a function name or function handle to the solver and then the solver uses feval to indirectly call the function. In this paper, we show that although feval provides an acceptable abstraction mechanism for these types of applications, there are significant performance overheads for function calls via feval, in both MATLAB interpreters and JITs. The paper then proposes, implements and compares two on-the-fly mechanisms for specialization of feval calls. The first approach specializes calls of functions with feval using a combination of runtime input argument types and values. The second approach uses on-stack replacement technology, as supported by McVM/McOSR. Experimental results on seven numerical solvers show that the techniques provide good performance improvements.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    March 2013
    Published
    Sable Technical Report (2012-06-rev1), McGill University, Montreal, Canada
  10. A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems
    Abstract

    Superceded by newer report (2013-5, see above).

    Authors
    Rahul Garg    Laurie Hendren   
    Date
    November 2012
    Published
    ,
  11. A Modular Approach to On-Stack Replacement in LLVM
    Abstract

    In this report, we present a modular approach to implementing on-stack replacement that can be used by any system that targets the LLVM SSA intermediate representation, and we demonstrate the approach by using it to support dynamic inlining in McVM. McVM is a virtual machine for MATLAB which uses a LLVM-based JIT compiler. MATLAB is a popular dynamic language for scientific and engineering applications which typically manipulate large matrices and often contain long-running loops, and is thus an ideal target for dynamic JIT compilation and OSRs.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    April 2012
    Published
    Sable Technical Report (2012-01-rev1), McGill University, Montreal, Canada
  12. Refactoring MATLAB
    Abstract

    This report presents the important challenges of refactoring MATLAB along with automated techniques to handle a collection of refactorings for MATLAB functions and scripts including: function and script inlining, converting scripts to functions, and converting dynamic feval calls to static function calls. The refactorings have been implemented using the MATLAB compiler framework, and an evaluation is given on a large set of MATLAB benchmarks which demonstrates the effectiveness of our approach.

    Authors
    Soroush Radpour    Laurie Hendren   
    Date
    October 2011
    Published
    Sable Technical Report (2011-02), McGill University, Montreal, Canada
  13. McSAF: A Static Analysis Framework for MATLAB
    Abstract

    MATLAB is an extremely popular programming language used by scientists, engineers, researchers and students world-wide. Despite its popularity, it has received very little attention from compiler researchers. This report introduces McSAF, an open-source static analysis framework which is intended to enable more compiler research for MATLAB and extensions of MATLAB. The framework is based on an intermediate representation (IR) called McLAST, which has been designed to capture all the key features of MATLAB, while at the same time as being simple for program analysis. The paper describes both the IR and the procedure for creating the IR from the higher-level AST. The analysis framework itself provides visitor-based traversals including fixed-point-based traversals to support both forwards and backwards analyses. McSAF has been implemented as part of the McLAB project, and the framework has already been used for a variety of analyses, both for MATLAB and the AspectMATLAB extension.

    Authors
    Jesse Doherty    Laurie Hendren   
    Date
    December 2011
    Published
    Sable Technical Report (2011-01), McGill University, Montreal, Canada
  14. McFLAT: A Profile-based Framework for MATLAB Loop Analysis and Transformations
    Abstract

    This technical report presents a new framework, McFLAT, which uses profile-based training runs to determine likely loop bounds ranges for which specialized versions of the loops may be generated. The main idea is to collect information about observed loop bounds and hot loops using training data which is then used to heuristically decide upon which loops and which ranges are worth specializing using a variety of loop transformations.

    Authors
    Amina Aslam    Laurie Hendren   
    Date
    July 2010
    Published
    Sable Technical Report (2010-06), McGill University, Montreal, Canada
  15. Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT Compiler
    Abstract

    Several aspects of the MATLAB language such as dynamic loading and typing, safe updates, and copy semantics for arrays contribute to its appeal to the scientific communities, but at the same time provide many challenges to the compiler and virtual machine. One such problem, minimizing the number of copies and copy checks for Matlab programs has not received much attention. The classical approach to minimizing the number of copies (i.e., reference counting) does not work in a garbage-collected virtual machine. This technical report presents a staged static analysis approach that does not require reference counts, thus enabling a garbage-collected virtual machine.

    Authors
    Nurudeen Lameed    Laurie Hendren   
    Date
    July 2010
    Published
    Sable Technical Report (2010-05), McGill University, Montreal, Canada
  16. AspectMatlab: An Aspect-Oriented Scientific Programming Language
    Abstract

    This technical report is an extended version of the AOSD 2010 paper above.

    Authors
    Toheed Aslam   
    Date
    January 2010
    Published
    Sable Technical Report (2009-03), McGill University, Montreal, Canada