[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pasrser and EOF



Etienne Gagnon writes:
> Mainly, the problem is what is the following:
> 
> LR parse stack: pair
> No Lookahead information
> =>
> Parser doesn't know if it should:
> (1) reduce: pair -> start
> (2) shift the next token. (It could be a "foo").
> 
> Now, if it looks ahead, the parser might see a "foo" and decide to
> continue parsing. But this means that the following three inputs will
> have inconsistent results:
> 
> foo bar foo eof -> No AST. Error.
> foo bar bar eof -> One AST (foo bar). 2 errors: (bar), (eof).
> foo bar eof -> One AST. No error (?).
> 
> If you want a more consistent result like:
> 
> foo bar foo eof -> One AST (foo bar). 1 (or 2) error: (foo eof).
> foo bar bar eof -> One AST (foo bar). 1 (or 2?) error: (bar eof).
> foo bar eof -> One AST. No error (?).
> 
> then you need backtracking. i.e.: restore the parse stack to the state
> it was before reducing the second foo, then resume parsing by doing a
> reduce action instead of a shift.
> 
> Hendrick seemed to be aware of this problem, and he proposed a very
> elegant solution. His solution has the advantage that the parser will
> NEVER look ahead beyond the last token of a string of the recognized
> language; which is exactly what we want.

Yes, that makes it very clear. Practially speaking, the languages
that I'm interested in "adding" to the parser's capability are
handled by Hendrick's idea (restriction: no lookahead needed to
reduce start symbol), so this kind of change to SableCC makes me
happy. So consider this discussion academic from now on :-)

While my idea is less restrictive, it's more "dangerous" because
of the potential for surprises, e.g., "foo bar foo eof" -> no AST.

On the other hand, *if* you knew what you were doing and could
accept that "foo bar foo eof" will generate an error, then you might
prefer the less restrictive approach.

As an example, in a "command line" based language, you might actually
*want* "foo bar foo eof" to generate an error and *no* AST, because the
command "foo bar" has extra garbage at the end of it.. so the input line,
taken as a whole, is not a valid command.

So to me, the "unexpectedness" of this behavior is really not so
surprising. But maybe that's just because I'm coming from the angle
of trying to solve a specific problem, namely, parsing a command line
based protocol, that expects the unexpected..

Either way, I'm happy :-)

-Archie

___________________________________________________________________________
Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com