[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SableCC hangs when 'compiling' a grammar

Mariusz Nowostawski wrote:
> There are two cases now as far as I can see it:
> 2. It is perfectly normal to have such a huge number of states to be
>    optimized, and it is SableCC fault not to cope with it. Some memory
>    usage optimization is needed.

This is somewhat the case.  There are 2 things:
1- Unlike most lexer generators, SableCC can deal with Unicode (16 bit)
characters.  This, by itself, causes some additional memory usage that
is difficult to avoid.
2- SableCC could (probably) do a better job at reducing state numbers by
doing things a little differently.  But this requires major refactoring
of lexer generation code.  I was saving that work until my Ph.D. is

The idea is that SableCC computes a single global "unoptimized" NFA,
tranforms it into a single (gigantic) "unoptimized" DFA, then applies
the well known DFA minimization algorithm.  Instead, SableCC should
probably tranform each little NFA into its equivalent DFA, optimized it,
then use the little optimized DFAs as building blocks for constructing
larger NFAs, and so on.

There might also be some tricks to optimize the DFA/NFA representation,
but I am reluctant to go to an "all array" representation.  I like using
OO to represent things.  Much easier to debug/maintain, even though it
might be slower.  [And it gives some job to VM designers for optimizing
their vms;-)]

I definitely have no time to attack this problem before I'm done with my
current diploma.

P.S. The Scriptonite project is also on hold until then. If you need an
ECMAScript parser, there are other free ones.  The goals of the
Scriptonite project was more of an practical exercise to show how to
build an interpreter with SableCC.
Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                             http://www.sablecc.org/
and SableVM:                                   http://www.sablevm.org/