[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SableCC sources doc?



Hi Robert.

Robert Feldt wrote:
> Can you point me to any documentation
> for the sources? 

There is no special document on this.  But, variable/class names are
pretty explicit.

The implemented algorithms, to compute lexer/parser tables, are in in
the "dragon book" (Compilers: Principles, Techchniques, and Tools by
Aho, Sethi, and Ullman, ISBN 0-201-10088-6).

> If not, can you give some overview of the sources? Which
> classes have been bootstrapped/autogenerated and which ones have been
> handcoded? Are there any "clusters" of related classes?

GENERATED CLASSES
=================
Classes in the org.sablecc.sablecc.[parser/lexer/node/analysis] packages
were mostly generated using SableCC 1.0, but then I have made many hand
modifications to it.  SableCC 1.0 itself was mostly hand written.  Some
of its parsing tables were generate with CUP, but then hand modified. 
So, no there isn't an completely automatic bootstrapping process.  But
you can alway check the parsing/lexing tables by hand.

Also, there was an JDK 1.2 issue.  So you have to uncomment some source
code to generate a JDK 1.1 SableCC, then use serialization to save the
parsing tables (for the parser.dat file).  It's a mess, I know.  But JDK
1.2 has a limit on methods size, and the static initializer of the paser
array is too big for it!

I plan to solve this bootstrapping problem in SableCC 3.x (if somebody
agrees to finance it at the end of my Ph.D. studies.  Any sponsor, out
there?).  I ould make sure that:
1- SableCC 3.0 can bootstrap itself.
2- SableCC 3.0 documents its parsing tables, so that manual verification
would be easy.

MANUALLY WRITTEN CLASSES
========================

All classes in the org.sablecc.sablecc package are hand written.

Most of the classes have pretty explicit names:

- Gen*.java: generate the appropriate files [alternative classes(Alt),
Parser.java (Parser), ...]
- Grammar: main parsing table computation.
- LR* classes: used for computing the LR0/1 sets of the parser...
- DFA/NFA: for computing the lexer
- MacroExpander: Very useful!  Allows the use of template macros for
code generation.
- ResolveIds: Symbol tables for SableCC, and semantic verifications.

The main difficulty you'll have in reading the AST traversal stuff, is
that SableCC 1.0 ASTs didn't have a nice naming scheme.  Child nodes
were numbered, e.g. node.getNode1(), getNode2(), etc.

I would discourage any major grammar enhancement to SableCC 2.x.  Any
major vocabulary enhancement should be delayed until SableCC 3.0, as it
will be able to bootstrap itself, allowing easy grammar extenstions.

I hope this helps.

By the way, can you tell us a little about your plans?

Etienne
-- 
----------------------------------------------------------------------
Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                             http://www.sablecc.org/
and SableVM:                                   http://www.sablevm.org/