[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Error recovery



Hi Dan,


I have modified some of the SableCC source tree to reflect your error
recovery changes. We need a single source base we can comment on and
discuss, so, I have decided that SF CVS is probably the best place to put
it.  So, guys, have a look, and speak aloud.


Changes:

* name conflicts have been already fixed some time ago by Etienne, so the
  tree compiles just fine with jikes (1.14 is beeing used on my end)

* cached Exception performance fix, and some other minor performance fixes
  have been applied earlier

+ SableCC.java do generate Error Recovery classes

+ alternatives.txt and utils.txt do have removeChild() calls commented out

+ GenLexer.java, GenParser.java, parser.txt and lexer.txt - smaller and
  bigger changes, following almost strictly Dan's suggestions.

+ errorrecovery.txt added

- the "reversible parser hack". Well, Dan, Do you will work on that?
  I agree that appropriate generated table would be much better
  solution, but, I do not think I am knowledgable enough to bite it. And I
  am pretty much "booked out" at the moment, so, any takers?
  Others? any suggestions, improvements, comments?



Dan,

please do fetch the latest CVS source tree, build the sablecc.jar and try
it out with your stuff which does use the error recovery features.
If there is something wrong simply let me know, or fix it against the
current CVS and I will patch it. I have generated some parsers using the
new sablecc and it all seems to be in sync, however I do not use any of
error recovery stuff at the moment so I have not tested it. Maybe a small
example for Java grammar or something would be nice - we could add it to
grammars subproject.

Aha, and tell Etienne one more time all the copyright details so he will
acknowledge it appropriately, right Etienne?  ;o)


best regards
Mariusz



On Fri, 1 Jun 2001, Dan Sandberg wrote:

> Halleluja!  I have finally obtained permission from Juniper Networks,
> Inc., to contribute the changes I have made to SableCC.
>
> I have sent the changes to Mariusz for eventual integration.
>
> Below I will explain the basics of the error recovery I implemented, and
> talk about what needs to be done.
>
> You can register an errorhandler with the parser which will trigger a
> callback when a parsing exception occurs.
>
> This callback has a paramater of type ErrorRecoverer, which allows you
> to modify the state of the lexer / parser in a way which I believe does
> not break encapsulation.
>
> The ErrorRecoverer interface contains the following methods:
>
>     void processToken( Token p_token ) throws IOException,
> LexerException, ParserException;
>     Token unprocessLastToken() throws IOException, LexerException,
> ParserException;
>
>     int readChar() throws IOException;
>     int readCharBackwards() throws IOException;
>
>     int getLexerState();
>     void setLexerState( int p_state );
>
> unprocessLastToken makes the parser 'go in reverse'.  It tells it to go
> back to the exact state it was in before it ever encountered the
> previously encountered token.  You can repeat this step multiple times
> to go back to any point you want.  For example, you can keep
> unprocessing tokens until you reach a semicolon.  You can then make the
> parser process tokens ( as if they were coming from the lexer ).  This
> lets you put the parser in any state you want before continuing the
> parse.  You can also modify the lexer state, by directly setting the
> lexer state, or by using readChar and readCharBackwards to read forward
> and backwards in the lexer input stream.
>
> The lexer now uses a SableCCPushbackReader, instead of a
> PushbackReader.  The SableCCPushbackReader has the following benefits:
>
> 1 - No fixed buffer size for characters being 'unread'
> 2 - The ability to go 'backwards' in an input stream.  Most readers are
> uni-directional, meaning that once you have read input, it is lost.  The
> SableCCPushbackReader remembers what it has read, meaning you can always
> go backwards and later re-read what you have already read.  Obviously if
> you are parsing a 1 meg file, this means you will need two copies of the
> file in memory.
>
> This system works fine, I have been using it for a few weeks now.  It
> does use an absolutely horrible hack, however.  In order to let the
> parser be able to 'unprocess' previous tokens, the parser now keeps a
> list of every shift / reduce that is done, so that it may be undone.  I
> think this can be dealt with by generating a table like the existing
> shift/reduce action tables, except that this table would apply in
> reverse.  For example, if on encountering the token "try" you shift from
> state 11 to state 12, then this table would tell you that if you are in
> state 12 and you encounter "try", you should shift to state 11.  In
> other words, the parser should be fully reversible.
>
> I know none of this is very clear, so please send your questions and
> comments in.