[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: Khun Yee Fung <kyeefung@extend.com>*Subject*: Re: Wish List: Prefix List for Token Definitions*From*: "Etienne M. Gagnon" <egagnon@j-meg.com>*Date*: Sun, 14 Nov 1999 17:26:55 -0500*CC*: sablecc-list@sable.mcgill.ca*Organization*: J-Meg inc.*References*: <E09B8717558DD211BF0300609793FBB00160E580@MAILSERVER>*Sender*: owner-sablecc-list@sable.mcgill.ca

Khun Yee Fung wrote: > 1. Is there a general method of avoiding what I have done? > 2. Is the prefix idea useful in other situations? Have you tried defining: Tokens star = '*'; Productions ... reg_exp = {...} exp [kleene]:star | {...} reg_exp [mult]:star exp | ...; In other words, you let the parser decide whether you have a kleene operator or a multiplication operator. This is the cleanest approach. Letting the lexer decide between the two is a quite risky business. It might hide ambiguities in your grammar, and resolve them arbitrarily, to the surprise of the programmers that will use the lenguage. Let me write a minimal grammar that does this. This grammar has characters and parentheses, as well as concatenation, multiplication and kleen operators. Tokens star = '*'; char = ['a'..'z']; l_par = '('; r_par = ')'; Productions reg_exp = {kleene} kleene_exp | {mult} reg_exp [mult]:star kleene_exp; kleene_exp = concat_exp [kleene]:star?; concat_exp = base_exp+; base_exp = {reg} l_par reg_exp r_par | {char} char; Indeed! There's a shift reduce conflict on star. Why, because the parser has difficulty deciding whether the shift or reduce when it sees a "star" in an input that looks like: INPUT: abcd*... POSSIBILITIES (for "..."): (1) abcd*abcd => reduce "abcd", because "*" is a multiplication (2) abcd*[EOF] => shift "*", because "*" is a kleene operator. Solution: The usual trick is to delay "reductions" as far as possible. Let's do it. Productions reg_exp = {concat} concat_exp [kleene]:star? | {mult} concat_exp [kleene]:star? [mult]:star reg_exp; concat_exp = base_exp+; base_exp = {reg} l_par reg_exp r_par | {char} char; Another view of the problem would be to say that an LALR(2) parser would look ahead past the "*" and see if there's an [EOF] or not, and thus the conflict would disappear. Parsing/language theory tells us that every LR(2) grammar has an equivalent LR(1) grammar. This result does not hold for LALR, in theory, but, in practice, normally you can get around this kind of conflicts by using usual conflict resolution mechanisms. I have already written something about these mechanisms (including "delaying reductions") in a message on the mailing-list (look in the archive). I hope this helps you. Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: egagnon@j-meg.com Author of SableCC: http://www.sable.mcgill.ca/sablecc/ ----------------------------------------------------------------------

**References**:**Wish List: Prefix List for Token Definitions***From:*Khun Yee Fung <kyeefung@extend.com>

- Prev by Date:
**Wish List: Prefix List for Token Definitions** - Next by Date:
**Re: code generator generators** - Prev by thread:
**Wish List: Prefix List for Token Definitions** - Next by thread:
**Re: code generator generators** - Index(es):