[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grammar questions



Indrek Mandre wrote:
> But I have an idea how to solve it by using the lexer.
> Lexer could count parenthesis after if/elseif and look if there's a colon
> after
> the last ')'. Not very easy to do but still seems easier than messing with
> the parser rules and post-processing. 

Right.  But you have to be careful with spurious parentheses in "strings
)" and /* comments ( */.

So, ideally, you should short-circuit the lexer on scanning an "elseif"
(and an "else"), then in the filter method, call the lexer repeatedly to
get the next tokens, saving them in a linked list, and counting
parenthese tokens, and you continue until get one non-ignored token past
the "balanced" closing parenthese.

Then you return either a:  TColonElseif or a TNoColonElseif 
(TColonElse/TNoColonElse).  You then intercept subsequent lexer calls to
return the saved tokens (no need to scan them over and over).

Something like:

Tokens
  else = "else";
  elseif = "elseif";
  ...
  colon_elseif =;
  no_colon_elseif =;
  colon_else =;
  no_colon_else =;

Productions

  stmt = 
    ... |
    if exp stmt_noif elseif_stmt_noif* elseif_stmt |
    ...

  elseif_stmt =
    no_colon_elseif exp stmt;
...

and in CustomLexer (extends Lexer), you must override the peek(), next()
and filter() methods.  I'll let you figure out the details.

> It won't be as fast probably tho.

If you do as suggested above, it shouldn't be slow.  I just hope you
don't need recursive behavior (in other words, it shouldn't be possible
in your grammar to have an embedded "if" statement in an expression). 
If it's possible, then you will have to add recursive behavior to the
above algorithm.

> But I dunno if it is possible using sableCC. 

Yes it is.  Why not?  There's the "filter" method for doing this.  I
just don't encourage people to go that route, as it is hazardous.

> Override the Lexer class
> and read stuff from pushbackreader and push it later back in?

In this case using a linked list of tokens is highly preferable to using
the Pushbackreader.  Usually, you use the pushback reader when the
tokens you are reading are not real tokens, and the text you push back
might get scanned differently.  This is not the case here.


I just had a question for you:  Are you sure of your interpretation of
PHP's specification (I have not read it!).  Personally I would have
thought the following statement possible:

if (a)
  x = 3;
elseif (b):
  x = 4;
  y = 5;
endif

In other words, you could mismatch the colon and no-colon modes in the
different parts of an "if" statement.  The only requirement would be
that a ":" imposes a closing token for the related part. e.g.

if(a) :
  x = 3;
  y = 4;
elseif (b)
  y = 5;

would be also valid as there's no need for an "endif" following an
"elseif exp stmt".

If you apply the "closest matching" rule to this above language, then
you can get an LALR(1) grammar for it.

Etienne
-- 
----------------------------------------------------------------------
Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                             http://www.sablecc.org/
and SableVM:                                   http://www.sablevm.org/