[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SableCC Thoughts Part II

Dan Sandberg wrote:
> Sorry for the rush, but I am working on a project for a client and so
> need to get things done quickly.  I will go ahead and code the error
> recovery so it does what I need, and I'll just hope that whatever is
> later decided on will be a superset of the functionality I implement.

That's OK.  I didn't realize your rush.  I think that your code would be
a nice addition to SableCC.  How familiar are you with SourceForge
(SF)?  If you want, I can add you as a developer of the project, and
grant you CVS write access.  I think Mariusz offered to help you.  He
could possibly take care adding your code in the CVS repository, if it
is difficult for you to get a functional SF setup. (Thanks, Mariusz!).

> [See my thoughts below]
> I think proper error recovery sometimes depends on semantics, rather
> than syntax, and so it would be dangerous for the parser to make
> assumptions that it knows how to handle errors.  By user intervention,
> you mean in customized error handlers written by the programmer, right?
> (Not the end-user).  If so, then I agree.

OK. This is what I wanted.  A debate on the question.

What I I was suggesting was to separate error recovery in two parts:
1- Syntax error recovery: automatic, based on the grammar structure. 
Yields an AST with [error] nodes.  Requires special visitor support.
2- Sematic error recovery: handled by the compiler programmer, mostly
after parsing, while visiting the AST.  Possibility of doing part of it
at parse time, within semantic predicates.

> I don't think the parser itself should ever try to fix parse errors (
> unless we add directives in the grammar about how to do this ).

It depends on what you mean by "fixing".  If you mean recovering from
errors, and completing the compilation process normally, then I fully
agree.  On the other hand, I think that it would be nice to have SableCC
generated parsers being able to automatically "recover" from a syntax
error to continue building the AST and potentially catch other errors. 
The advantage of this is that it would also allow the compiler
programmer to visit this AST and output both syntax and semantic error
messages, in order of appearance in the original file.

I am far from convinced that the average compiler programmer can do a
good job at "syntax error recovery".  "Semantic error recovery" on the
other hand is better handled by a programmer.  As I said, some times the
programmer knows about a common mistake for users of a particular
language.  I imagine that such precise cases could be described in the
grammar itself.

Mainly, I think that flexibility in the syntax, or even recovery of know
mistakes could simply be encoded in the grammar itself.

For example, if you know that a missing "(" in a C "for" statement is
common, you could define the following grammar:

statement = ... | {for} for l_par? ... | ...

then, after parsing, you look for AForStatement nodes, and check that
"getLPar()" is not NULL.  If it is, you issue an error message.

Within semantic predicates, you can also do some checks.
In summary: I think that syntax error recovery (not fixing!) could be
automatic, to provide a way beyond getting a single error message, as
SableCC parsers do now.

I think that neither an "automatic tool" nor a compiler programmer
written one should ever "fix" an error.  If there is a possibility of
"fixing" an error, this should be part of the grammar definition, not
part of the error recovery mechanism.  I think that the biggest problem,
at this point, is the lack of flexibility of SableCC's "Pure LALR(1)"
parsers to define this kind of grammars.

Maybe I'm wrong:-)

> BTW, I would guess that most people who use Compiler Compilers are using
> them to parse text, such as configurations or expressions like a==3 or b
> > 6 rather than to write compilers.  I for instance am parsing a (mostly, but not completely) line oriented configuration file.  So I know that to fix an error, I can always just read characters until an EOL, and continue from there.

Are you "fixing the error"?  Or maybe you meant, that you can "recover"
from the error.

In this paricular case, my suggested "automatic syntax error recovery"
mechanism would yield an AST with [error] nodes.  After parsing, you
could detect all "lines" which contain [error] nodes, issue an error (or
a warning?) for these line, and use the other lines as expected.  This
offers a clear separation of the tasks: syntax -> SableCC, semantics ->
compiler programmer.

I think that anything beyond error recovery (e.g. "fixing") should only
be done with "end user" explicit consent.

Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                             http://www.sablecc.org/
and SableVM:                                   http://www.sablevm.org/