[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grammar questions




Hi SableCC people!

At first I'd like to thank you for such a great parser generator.
I hope you'll continue to improve it. Great work.

I'm trying to write a grammar for the PHP language and have run into
couple of problems. I have used a bit the bison parser generator
before so I don't feel very odd with the LALR grammars. Tho I
discovered there's still a lot i'm missing.

Anyways here are couple of my thoughts.
I run a Linux system with SableCC 1.16.1 on Java v1.3(SUN).

My biggest problem with sablecc nowadays is that it is extreamly slow
on generating the parser. One generation&compilation cycle takes few
minutes. So I can make myself a cup of coffee and discover minutes
later it poped up a little shift/reduce problem in my code. That
is a bit frustrating remembering the speed of bison.
For example I timed the java1.1 parser generation: 5m35.778s.

I know the emphasis in SableCC is to put the actual work on later
tree modifications/by tree code generation but still, it's too slow.
I hope you have plans to speed it up somehow.

I'd like to remember with a good word the operator precedence and
associativity settings of bison. I think most computer languages out
there are very much operator dependant and this little feature could
save people from the operator layering work. Tho it works that way
it could be much more straight-forward and save a lot at later
processing. Still i'm no parsers specialist so I guess you had a
good reason leaving that-kind of functionality out.

Still I have to say I like the "cleanness" of sablecc grammars. On
the other hand i curse that I have to write more than I had to
with bison parsers but well, it's in the interests of less errors
and I have to say I've had none I couldn't solve in 5 seconds.
But I haven't gone to deeper processing&code generation stages,
I hope i won't disapoint there. Storing no data at nodes is
an interesting approach but i feel is not justified in all cases.

The other thing about LALR grammars is the dangling if/else problem.
This was the most difficult thing to solve and I used well the
example java grammar.

Anyways to the point now - real life problems I don't understand
well:

Package test;

Tokens

  var = (['A'..'Z'] | ['a'..'z'])+;
  lp = '(';
  rp = ')';
  eq = '=';

Productions

expr =
      {assign}        lp var rp eq expr
    | {other}         expr2
;

expr2 =
      {par}           lp expr rp
    | {var}           var
;

I'd like to that this grammar accepted things like: (a) = b; In reality
I would have 10 layers between expr and expr2. How to solve this?
For now I decided to use 'expr2 eq expr' and perform the lvalue checking
later.

My second problem is greater and more difficult. I solved the dangling
if/else problem as it was done in java1.1 and thought that's it.
But there's more and i couldn't manage to make it work.

The language has besides the 'if' and 'else' terminals also an 'elseif'
statement. So statements like that are valid:

if (a) x(); elseif (b) y(); elseif (c) z(); else w();

Now I got this working, correct I hope. The next thing with this
language is support for second style if statements that use colon:

if (a):
x();
elseif (b):
y();
elseif (c):
z();
else:
w();
endif;

Both kind of 'if' statements are allowed and by idea you could mix them.
How to do this? I managed with the first but adding the second never
worked. Here's how far i got:


Package test;

Tokens
  if = 'if';
  else = 'else';
  elseif = 'elseif';
  endif = 'endif';
  lpar = '(';
  rpar = ')';
  identifier = ['0'..'9']+;
  colon = ':';

Productions

  program = stmt+;

  stmt =
      {if}            if_stmt
    | {html_if}       html_if_stmt
    | {expr}          expr
  ;

  noif_stmt =
      {expr}          expr
    | {html_if}       html_if_stmt
    | {noif_if}       noif_if_stmt
  ;

  if_stmt =
      {if}            if lpar expr rpar stmt
    | {else}          if lpar expr rpar noif_stmt else stmt
    | {elseif}        if lpar expr rpar noif_stmt elseif_stmt
  ;

  elseif_stmt =
      {if}            elseif lpar expr rpar stmt
    | {else}          elseif lpar expr rpar noif_stmt else stmt
    | {elseif}        elseif lpar expr rpar noif_stmt elseif_stmt
  ;

  noif_if_stmt =
      {else}          if lpar expr rpar [s1]:noif_stmt else [s2]:noif_stmt
    | {elseif}        if lpar expr rpar [s1]:noif_stmt noif_elseif_stmt
  ;

  noif_elseif_stmt =
      {else}          elseif lpar expr rpar [s1]:noif_stmt else
[s2]:noif_stmt
    | {elseif}        elseif lpar expr rpar noif_stmt noif_elseif_stmt
  ;

  html_if_stmt =
      {html}          if lpar expr rpar colon stmt* endif
    | {html_else}     if lpar expr rpar colon [s1]:stmt* else [c2]:colon
stmt* endif
  ;

  expr =
      {id}            identifier
  ;


I feel a radically new approach in other dimension is needed to solve
this last problem.

I hope you can bring light to my darkness of ignorance.

Best regards,
Indrek Mandre