[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pasrser and EOF

Etienne Gagnon writes:
> The idea is to reject any language that would require backtracking, and
> accept all others. A language that has a start production ending with
> eof does not require backtracking. But other languages fall into that
> category. We want to add these languages to the set of languages parsed
> by SableCC. But SableCC will reject any other language. This seems
> intuitive enough, because a language that requires backtracking has,
> like you called it, an ambiguous end of input; which we want to avoid.
> The problem, now, is to give meaningful error messages. But that's my
> problem;-)

> > It seems to me like either (a) the language inherently *requires* the
> > parser to look beyond the last token, or else (b) the language does
> > not require this -- and a normal parser won't do it.
> Yes. We want to reject (a) and accept (b) languages. I DO NOT WANT TO
> IMPLEMENT BACKTRACKING! It is expesive in memory space requirement
> and/or computation time.
> I want SableCC to remain deterministic! Decisions should be made in O(1)
> and parsing should be O(n) where (n = AST size).

First of all, I'm a little ignorant of the precise meaning of the
word "backtracking" and hence when it is required by the language.

What I think you mean is shown by this example:

  Start =

  pair =
    foo bar;

Input: "foo bar foo bar foo blah";

In this case, the valid part of the input is "foo bar foo bar", but
a normal parser will *shift* (not just peek at) the last "foo" token
before it can accept the input.

In this case, "backtracking" would be required to unshift that last
foo token, then reduce and accept. Unshifting is not something typical
parsers do.

However, I wouldn't ask for the parser to do this. In my opinion,
it would suffice for the parser to generate a parse error upon seeing
the "blah" token.

On the other hand, I *would* expect the parser to successfully
accept this input: "foo bar foo bar blah"... because it only has
to peek at the "blah" .. the "blah" token never actually gets shifted.

In other words, the parser only has to be able to accept the input
when no backtracking, or unshifting, would be required.

I claim this kind of slightly weaker parser is exactly what you
get if you take SableCC and simply remove the implicit initial rule
that adds "eof" to the end of the grammar... and IMHO it is a
worthwhile action because it only adds to the set of languages that
the parser can parse, without doing any work!


P.S. The email from this list has "Reply-To: sablecc-list@CS.McGill.CA"
in the header, but this is not a valid address (mail bounces).

Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com