Compiler toolkits make it possible to rapidly develop compilers and translators for new programming languages. Recently, toolkit writers have focused on supporting extensible languages and systems that mix the syntaxes of multiple programming languages. However, this work has not been extended down to the lexical analysis level. As a result, users of these toolkits have to rely on ad-hoc solutions when they extend or mix syntaxes. This thesis presents MetaLexer, a new lexical specification language that remedies this deficiency.

MetaLexer has three key features: it abstracts lexical state transitions out of semantic actions, makes modules extensible by introducing multiple inheritance, and provides cross-platform support for a variety of programming languages and compiler front-end toolchains.

In addition to designing this new language, we have constructed a number of practical tools. The most important are a pair of translators that map MetaLexer to the popular JFlex lexical specification language and vice versa.

We have exercised MetaLexer by using it to create lexers for three real programming languages: AspectJ (and two extensions), a large subset of Matlab, and MetaLexer itself. The new specifications are easier to read and require much less action code than the originals.


MetaLexer is covered by a (modified) BSD license.

Current Files

My thesis serves as the de-facto manual for MetaLexer.

Developers can download the source for MetaLexer.
Others will probably prefer a binary distribution:


For full functionality, you should download JFlex.

JFlex-to-MetaLexer translator

The JFlex-to-MetaLexer translator is available separately here.

Case Studies

MetaLexer has been used to create lexers for the three practical programming languages:

Old Files

Developer Manual (Deprecated)

User Manual (Deprecated)