[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Error recovery

Subject: Error recovery
From: Dan Sandberg <dan@siliconpoetry.com>
Date: Fri, 01 Jun 2001 14:08:19 -0700
CC: sablecc-list@sable.mcgill.ca
Organization: Silicon Poetry, Inc.
References: <Pine.GSO.4.32.0105251638070.14712-100000@recife>
Reply-To: sandberg@juniper.net
Sender: owner-sablecc-list@sable.mcgill.ca

Halleluja!  I have finally obtained permission from Juniper Networks,
Inc., to contribute the changes I have made to SableCC.

I have sent the changes to Mariusz for eventual integration.

Below I will explain the basics of the error recovery I implemented, and
talk about what needs to be done.

You can register an errorhandler with the parser which will trigger a
callback when a parsing exception occurs.

This callback has a paramater of type ErrorRecoverer, which allows you
to modify the state of the lexer / parser in a way which I believe does
not break encapsulation.

The ErrorRecoverer interface contains the following methods:

    void processToken( Token p_token ) throws IOException,
LexerException, ParserException;
    Token unprocessLastToken() throws IOException, LexerException,
ParserException;

    int readChar() throws IOException;
    int readCharBackwards() throws IOException;

    int getLexerState();
    void setLexerState( int p_state );

unprocessLastToken makes the parser 'go in reverse'.  It tells it to go
back to the exact state it was in before it ever encountered the
previously encountered token.  You can repeat this step multiple times
to go back to any point you want.  For example, you can keep
unprocessing tokens until you reach a semicolon.  You can then make the
parser process tokens ( as if they were coming from the lexer ).  This
lets you put the parser in any state you want before continuing the
parse.  You can also modify the lexer state, by directly setting the
lexer state, or by using readChar and readCharBackwards to read forward
and backwards in the lexer input stream.

The lexer now uses a SableCCPushbackReader, instead of a
PushbackReader.  The SableCCPushbackReader has the following benefits:

1 - No fixed buffer size for characters being 'unread'
2 - The ability to go 'backwards' in an input stream.  Most readers are
uni-directional, meaning that once you have read input, it is lost.  The
SableCCPushbackReader remembers what it has read, meaning you can always
go backwards and later re-read what you have already read.  Obviously if
you are parsing a 1 meg file, this means you will need two copies of the
file in memory.

This system works fine, I have been using it for a few weeks now.  It
does use an absolutely horrible hack, however.  In order to let the
parser be able to 'unprocess' previous tokens, the parser now keeps a
list of every shift / reduce that is done, so that it may be undone.  I
think this can be dealt with by generating a table like the existing
shift/reduce action tables, except that this table would apply in
reverse.  For example, if on encountering the token "try" you shift from
state 11 to state 12, then this table would tell you that if you are in
state 12 and you encounter "try", you should shift to state 11.  In
other words, the parser should be fully reversible.

I know none of this is very clear, so please send your questions and
comments in.

-Dan

Follow-Ups:
- Re: Error recovery
  - From: Mariusz Nowostawski <mariusz@marni.otago.ac.nz>

References:
- Error recovery
  - From: Mauro Florencio Vieira <mfv@cin.ufpe.br>

Prev by Date: order in productions
Next by Date: How to write SableCC Productions
Prev by thread: Re: Error recovery
Next by thread: Re: Error recovery
Index(es):
- Date
- Thread