[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wish List: Prefix List for Token Definitions



Khun Yee Fung wrote:
> 1.      Is there a general method of avoiding what I have done?
> 2.      Is the prefix idea useful in other situations?

Have you tried defining:

Tokens
  star = '*';

Productions
...
reg_exp = 
  {...} exp [kleene]:star |
  {...} reg_exp [mult]:star exp |
  ...;

In other words, you let the parser decide whether you have a kleene
operator or a multiplication operator.  This is the cleanest approach. 
Letting the lexer decide between the two is a quite risky business.  It
might hide ambiguities in your grammar, and resolve them arbitrarily, to
the surprise of the programmers that will use the lenguage.

Let me write a minimal grammar that does this.  This grammar has
characters and parentheses, as well as concatenation, multiplication and
kleen operators.

Tokens
  star = '*';
  char = ['a'..'z'];
  l_par = '(';
  r_par = ')';

Productions
  reg_exp =
    {kleene} kleene_exp |
    {mult}   reg_exp [mult]:star kleene_exp;

  kleene_exp =
    concat_exp [kleene]:star?;

  concat_exp =
    base_exp+; 

  base_exp =
    {reg} l_par reg_exp r_par |
    {char} char;


Indeed!  There's a shift reduce conflict on star.  Why, because the
parser has difficulty deciding whether the shift or reduce when it sees
a "star" in an input that looks like:

INPUT: abcd*...
POSSIBILITIES (for "..."):
  (1) abcd*abcd  => reduce "abcd", because "*" is a multiplication
  (2) abcd*[EOF] => shift "*", because "*" is a kleene operator.

Solution: 

The usual trick is to delay "reductions" as far as possible.  Let's do
it.

Productions
  reg_exp =
    {concat} concat_exp [kleene]:star? |
    {mult}   concat_exp [kleene]:star? [mult]:star reg_exp;

  concat_exp =
    base_exp+; 

  base_exp =
    {reg} l_par reg_exp r_par |
    {char} char;

Another view of the problem would be to say that an LALR(2) parser would
look ahead past the "*" and see if there's an [EOF] or not, and thus the
conflict would disappear.  Parsing/language theory tells us that every
LR(2) grammar has an equivalent LR(1) grammar.  This result does not
hold for LALR, in theory, but, in practice, normally you can get around
this kind of conflicts by using usual conflict resolution mechanisms.  I
have already written something about these mechanisms (including
"delaying reductions") in a message on the mailing-list (look in the
archive).

I hope this helps you.  

Etienne
-- 
----------------------------------------------------------------------
Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                 http://www.sable.mcgill.ca/sablecc/
----------------------------------------------------------------------