[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: epsilon tokens



Hi Istvan (that's your first name, right?),

VEROK Istvan wrote:
I'm a novice SableCC user and have run into the following tidbit:
I need to occasionally recognize epsilons (zero-length tokens)
in a grammar (yes, there IS a reason for that).  The problem
can be extracted to and illustrated by the following toy grammar:

==== eps.sablecc ====

Package eps;

States
 initial, rest;

Tokens
 {initial -> rest} eps = ;
 {rest} char = [0x0000 .. 0xffff];

Productions
 whole = eps char*;

==== ends here ====
Zero-length tokens were never meant to be recognized. This is because the lexer operates independently from the parser. Think about it: if zero-length tokens were allowed, there's nothing that would stop the lexer from returning the same zero-length token indefinitely (infinite loop).

Now, SableCC allows a grammar designer to specify zero-length tokens. These tokens are NEVER instantiated by the lexer, but might be very useful for "cutomized" lexers.

So, what is happening in your case is this:
In state "initial", the lexer recognizes NO tokens (no 1-or-more-length tokens were specified). So, when it sees the "e" character, the lexer complains that it doesn't match anything.

A solution:
===========
Package eps;


Tokens
char = [0x0000 .. 0xffff];

Productions
whole = eps char*;
eps = ;


Explanation: Zero-length tokens are not recognized, but zero-length productions ARE recognized. :-))

Have fun!

Etienne
--
Etienne M. Gagnon http://www.info.uqam.ca/~egagnon/
SableVM: http://www.sablevm.org/
SableCC: http://www.sablecc.org/