[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SableCC event driven?



On Tue, Feb 03, 2004 at 12:57:15AM +0100, Jochen Wiedmann wrote:
> Indrek Mandre wrote:
> 
> >Good work. I also looked into it and turned that patch into something 
> >clearer
> >for comparison. See attachment. Should be mostly the same thing.
> >I ignored some spacing to get a clearer diff.
> 
> Just one hint: Rather than invoking new Parser(null) I'd prefer a second
> constructor without arguments. This is a matter of taste, though.

I've been looking into this as part of the error repair mechanism I've been
working on. I got stuck for a while and tried using an event driven model
instead. It turns out to be a good match, it is indeed a simple matter to
create a backwards compatible parse() method. 

The way I've modelled this so far is to have an AbstractParser class with
package level constructors, that handles parsing and error recovery. This
has two subclasses with public constructors. The first is Parser, with the
usual public parse() method. The other is EventDrivenParser, with build()
and add(Token) methods. All three of these methods are implemented in terms
of package level methods provided by the AbstractParser superclass, and
indicate all errors, including parse errors, by throwing appropriate
exceptions. The state of the parser can then be examined after catching
these exceptions in order to find out if it a recoverable error or not. The
Parser class also gains a getLexer() method to allow the state of the lexer
to be examined when an exception is thrown by the parse() method. All
exceptions thrown by the filter() method in any of the parser or lexer
classes are automatically recoverable. 

The build() method finishes off the parsing once a sequence of tokeens
terminated by a single EOF token have been added using the add(Token)
method. The add(Token) method only adds the token if there are no parse
errors caused by previous tokens added to the parser, and no EOF token has
already been added. 

I'm not sure that separate build() and add(Token) methods are ideal. Maybe
the event driven parser class should have a single parse(Token) method that
returns null or a non-null Start node is better. This is the approach taken
by Jochen. It is slightly harder to make it mesh nicely with Burke-Fisher
error repair, since the parser needs to keep a queue of tokens that it can
modify in order to propose alternative repairs during error recovery. All
the queued tokens get flushed during calls to the build() method. The
parse(Token) method also makes it is hard to figure out the right thing to
do when a parse error is caused while flushing the queue of tokens in
response to adding an EOF token. 

The package level AbstractParser constructors mean that all parser
subclasses must be a subclass of either the Parser or EventDrivenParser
classes. I'm not so sure that is ideal. Maybe the AbstractParser should have
protected constructors and the event driven parsing methods used to
implement a single Parser subclass. 

The event driven model would make the recent requests for Python grammars
more feasible, since it could be embedded in a custom file reader that
figures out the number of NEWLINE, INDENT and DEDENT tokens that need to be
added in response to different kinds of blankspace. 

I haven't had much time to work on this recently, but some free time is
coming up soon, and I'd very much welcome the ideas of others before
finalising the API. I shall get some rudimentary documentation together
while I finish it off. 

-- 
Jon Shapcott <eden@xibalba.demon.co.uk>
"This is the Space Age, and we are Here To Go" - Wlliam S. Burroughs