[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: help with lexer

Etienne GAGNON writes:
> Archie Cobbs wrote:
> > However, one thing has always bothered me though about all parsers:
> > why is EOF considered a special token?!? Why can't it just be
> > another token like any other? Why does a parser always assume
> > that the very last token will be EOF?
> It has to do with the way a parser makes decisions, be it LL or LR.
> If you had a grammar like this:
> start = right_recursive eof; 
> right_recursive = id right_recursive | id;
> If you remove the 'eof', how does the parser know if it should parse the
> following into two ASTs or one AST? (or, in the general case, if it
> should return an error?)
> input: id id
> There is the possibility of saying: parse the longest string.

I have two possible responses.. :-)

 #1 Right.. you would need "parse the longest string" as an implicit
    rule. Note: this rule is the norm for lexical analysis.

    Your example (without the eof) is equivalent to:

      start = right_recursive; 
      right_recursive = id+;

    From a lexical point of view, everybody agrees what should happen...
    you shift & reduce id's until there aren't any more. One could claim that
    from a parsing point of view, the same logic and reasoning applies.
    That is, the same logic that says "always shift as much as you can
    before reducing".

 #2 On the other hand, it would make more sense to generate a shift/reduce
    conflict error upon compiling the grammar, because without the eof
    it's ambiguous. But that's the language designer's problem, not the
    compiler tool's problem!

    In my example, a newline always ends a command so the language is
    not ambiguous. I just want to use "eol" instead of "eof", but the
    tool is not giving me that choice.

I think I believe in #2 rather than #1.


Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com