[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grammar questions



Hi Indrek,

Indrek Mandre wrote:
> At first I'd like to thank you for such a great parser generator.
> I hope you'll continue to improve it. Great work.

Thanks.

> Anyways here are couple of my thoughts.
> I run a Linux system with SableCC 1.16.1 on Java v1.3(SUN).

You probably meant 2.16.1.

> My biggest problem with sablecc nowadays is that it is extreamly slow
> on generating the parser. 

I know.  This is one of the many things on the TODO list.  But the
priority will be on giving more meanignful messages, then improving
speed.  [Funding needed in a few months from now...;-)]

> I'd like to remember with a good word the operator precedence and
> associativity settings of bison. 

I do not like the way it is specified in bison; it is ambiguous, as you
give a precedence to a token independently from its context.  There will
be an unambiguous way for specifying operator precedence in SableCC 3.x.

Understanding the bison ambiguity:  Imagine a grammar with different
operator precedence in different sections.

e.g.:  (HTML like language but with operators)

some text and left associative expression 3 + 4 + 5 and tags <TAG with
different operator associativity, let say right associative 6 + 7 +8>...

This is not expressible with bison.  Worse, when you give an
associativity and precedence to an token, it is valid for all conflicts
involving this token, regardless of its parsing context.  So, unless you
really know what you are doing (and wha6t bison is doing!!!), it can be
quite dangerous to use the precedence mechanism.

Just think of the "*" operator in C/C++.  It can be used as "multiply"
in arithmetic expressions, and as a "dereference" operator in pointer
expressions.  How do you know the associativity/precedence of "multiply"
isn't hiding problems in the grammar involving "dereference"?

> The other thing about LALR grammars is the dangling if/else problem.

This one is even more tricky to understand and specify without
ambiguity.  SableCC 3.x will have some mechanism for it, but it will
only deal with simple cases (unambiguously).

> Anyways to the point now - real life problems I don't understand
> well:
> 
> Package test;
> 
> Tokens
> 
>   var = (['A'..'Z'] | ['a'..'z'])+;
>   lp = '(';
>   rp = ')';
>   eq = '=';
> 
> Productions
> 
> expr =
>       {assign}        lp var rp eq expr
>     | {other}         expr2
> ;
> 
> expr2 =
>       {par}           lp expr rp
>     | {var}           var
> ;
> 
> I'd like to that this grammar accepted things like: (a) = b; In reality
> I would have 10 layers between expr and expr2. How to solve this?
> For now I decided to use 'expr2 eq expr' and perform the lvalue checking
> later.


If you read my past message about LALR(1) conflicts [browse the
archive], I said that "reducing" is bad, as it forces you to make a
decision.

In this cases, the problem is that when the parser sees:

"(" "a" lookahead ")"

it has to decide whether to reduce "a" to expr2{par} , or shift it in
expr{assign}.

You could probably delay the decision, and simply rewrite expr{assign}
as:

{assign}        lp expr rp eq expr

then weed invalid assignments in a post-parsing stage [this will detect
more parsing errors than "expr2 eq expr", as "a = b" will be rejected]. 
I haven't tried to lauche SableCC to check if any conflict remains, but
I hope you get the idea.  A LALR^1(k) parser would accept your initial
grammar as-is, as LALR(2) would check the character following the ")"
and could decide to reduce if it sees "EOF" and shift if it sees "eq". 
[Planned for SableCC 3.x...]

> My second problem is greater and more difficult. I solved the dangling
> if/else problem as it was done in java1.1 and thought that's it.
> But there's more and i couldn't manage to make it work.
> 
> The language has besides the 'if' and 'else' terminals also an 'elseif'
> statement. So statements like that are valid:
> 
> if (a) x(); elseif (b) y(); elseif (c) z(); else w();
> 
> Now I got this working, correct I hope. The next thing with this
> language is support for second style if statements that use colon:

If I understand well, the ":" always follows the ")" of an "if" or
"elseif" or follows an "else" .

> Both kind of 'if' statements are allowed and by idea you could mix them.
> How to do this?

I see the following quick solution.

if_stmt =
  {if}            if lpar expr rpar colon? stmt
| {else}          if lpar expr rpar colon? noif_stmt else colon? stmt
| {elseif}        if lpar expr rpar colon? noif_stmt elseif_stmt
;

elseif_stmt =
  {if}            elseif lpar expr rpar colon? stmt
| {else}          elseif lpar expr rpar colon? noif_stmt else colon?
stmt
| {elseif}        elseif lpar expr rpar colon? noif_stmt elseif_stmt
;

etc.

In other words, you use introduce an optinal colon in your elseif
solution.  Again, I have not tried it.  Please report your findings back
to us.

> I feel a radically new approach in other dimension is needed to solve
> this last problem.

This shouldn't be more complicated than adding "colon?" in the
appropriate places.  I hope so...

Have fun!

Etienne
-- 
----------------------------------------------------------------------
Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                             http://www.sablecc.org/
and SableVM:                                   http://www.sablevm.org/