[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fw: Pasrser and EOF

From: Archie Cobbs <archie@whistle.com>
To: gagnon@CS.McGill.CA <gagnon@CS.McGill.CA>
Date: Tuesday, July 21, 1998 8:32 PM
Subject: Re: Pasrser and EOF

Etienne GAGNON writes:
> It seems that implementing a parser without some kind of end-of-input
> character is not really feasible without changing the "expected"
> behavior of the parser. This is because some backtracking would be
> needed, and I am not sure I want to deal with this kind of headache.
> On the other hand, the "end-of-input" token could be user specified.

Having the token user-specified would solve my need.

You're probably getting sick of talking about this, but it's an
interesting topic to me... :-) Tell me if you want me to be quiet :-)

I'm curious: what would the "unexpected" behavior be? Are you
saying that backtracking would always be needed, or would it only
be needed to make the behavior less "unexpected"?

If you look at the actual parser code generated, there's nothing
in there that treats the EOF token specially. There's no code
anywhere that says if (token == EOF) ... . However, two things
are generally special about EOF in the *language* :

#1 EOF doesn't appear in the normal input anywhere, and
    thus it will always force either a reduction or an error.

#2 EOF is the only token that causes an ACCEPT action; this is
    simply the result of #1 and the implicit first rule:

      start = 
grammar eof;

Any token which doesn't appear in the input would satisfy rule #1.

In fact these two "functions" of EOF are independent of each other.

I wouldn't claim that most languages remain non-ambiguous when
you leave off the terminating eof .. clearly not ("C" is an easy
counter example because you can define an arbitrary number of functions
in a file).

I do think that the "one shot" mentality, ie, the assumption that
the parser must read ALL of the input before it accepts, is broken.

Any non-ambiguous LR language that has the property of allowing an
arbitrary number of "things" at the end of the "sentence" must have
some "terminating token" which will cause the final reduction and
ACCEPT action. This is clear.

But this token doesn't necessarily have to not appear anywhere else
in the input, e.g.:

  start =
    foo bar* foo;

This language is not ambiguous, and doesn't need the implicit EOF

IMHO, on input "foo bar bar bar foo <anything>.." the parser should
read the first five tokens, then ACCEPT, leaving the rest of the
input unread. There's nothing ambiguous about it, if you take the
view that it's OK not to read the whole input when ACCEPTing is the
only non-error choice.

I guess I need an example of the following:

  - Non-ambiguous LR language NOT having an EOF rule.
  - Parser can't be generated using the "normal" algorithm.

but I can't think of one.


Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com