[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: help with lexer
Etienne GAGNON writes:
> Archie Cobbs wrote:
> > However, one thing has always bothered me though about all parsers:
> > why is EOF considered a special token?!? Why can't it just be
> > another token like any other? Why does a parser always assume
> > that the very last token will be EOF?
>
> It has to do with the way a parser makes decisions, be it LL or LR.
>
> If you had a grammar like this:
>
> start = right_recursive eof;
> right_recursive = id right_recursive | id;
>
> If you remove the 'eof', how does the parser know if it should parse the
> following into two ASTs or one AST? (or, in the general case, if it
> should return an error?)
>
> input: id id
>
> There is the possibility of saying: parse the longest string.
I have two possible responses.. :-)
#1 Right.. you would need "parse the longest string" as an implicit
rule. Note: this rule is the norm for lexical analysis.
Your example (without the eof) is equivalent to:
start = right_recursive;
right_recursive = id+;
From a lexical point of view, everybody agrees what should happen...
you shift & reduce id's until there aren't any more. One could claim that
from a parsing point of view, the same logic and reasoning applies.
That is, the same logic that says "always shift as much as you can
before reducing".
#2 On the other hand, it would make more sense to generate a shift/reduce
conflict error upon compiling the grammar, because without the eof
it's ambiguous. But that's the language designer's problem, not the
compiler tool's problem!
In my example, a newline always ends a command so the language is
not ambiguous. I just want to use "eol" instead of "eof", but the
tool is not giving me that choice.
I think I believe in #2 rather than #1.
-Archie
___________________________________________________________________________
Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com