[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AST transformations

Hi Hendrick.

Hendrik Boom wrote:
> There are two kinds I am considering...
> (1) macros:
> (2) tree replacements

This is definitely more powerful than the SableCC 3 proposed scheme.  But it
also suffers from a few problems, in my opinion:

(1) It is still (or at least seems) more limited than a post-parsing AST
(2) Pushing back terminals/nonterminals on the input stream doesn't seem that
intuitive from both the compiler writer and compiler user points of view. 
e.g. You can have a perfectly "parsable" grammar, but your replacement rules
can cause parsing error!  The potential new errors could be undetected by the
compiler generator (unless you add restrictions on replacements), and thus the
end user might get a parse error on a seemingly valid program (if there is an
error in a rewriting rule).

I personally prefer to implement the simple-to-understand, deterministic,
terminating, linear-time, strongly-typed, bottom-up and non-reordering scheme
I proposed at "parse time".  This will be sufficient to simplify the AST of
most programs, potentially lowering significantly the parse-time memory
requirement for AST storage.  Then, a full blown post-parse AST rewriting
system ( la Prof. Jeff Baalen) would take care of other more complex AST
transformations.  Ideally, this rewriting system would also be strongly-typed,
thus saving much headaches to compiler writers.

I think that it is more important to provide simple to understand, and pifall
catching (e.g. strongly-typed) systems that will save amateur (and
professional) compiler writers from writing potentially incorrect
specifications, than to provide "the most powerful" system that allows you
(and almost encourages you, sometimes) to shoot yourself in the foot.  

The perfect example of this is my choice of not supporting YACC/Bison like
conflict resolution mechanisms in SableCC.  Amateur, and even professional
compiler writers often use these mechanisms, as it is usually much easier to
simply use them than rewrite the grammar.  But, I can tell you that even
experienced professors get caught introducing parsing anomalities without
being aware of it.  

It happened last year when I did rewrite a Bison grammar to SableCC.  This
grammar was written by a professor for his students course project.  Some
students wanted to use SableCC.  SableCC detected the grammar conflicts, and
forced me to understand them to be able to rewrite the grammar.  This allowed
me to identify that the grammar was inherently ambiguguous.  The ambiguity was
"resolved" arbitrarily in the Bison version (shift, I think).  The consequence
of the ambiguity was important from a "compiler user" point of view:  Assuming
the language specification, a programmer would write a valid program expecting
specific semantics.  The arbitrary resolution would cause the parser to give
an error message on this valid program.  Imagine the disarray of the poor
programmer that can't begin understand what's wrong with his program (there's
nothing wrong... It's just the parser that is not parsing the program

Some people disagree with me on this.  This is fine!  There is enough place in
our world for many parser generators.  But, I'd like to keep the "protect
people from common pitfalls" philosophy in SableCC.  I think that it is one of
its strong selling points, even though it may force you to think longer about
your problem, initially;-)

I still think that your proposed system is a nice and powerful system, but
maybe not appropriate for SableCC.

Thanks Hendrick for sharing your always excellent ideas.

Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                 http://www.sable.mcgill.ca/sablecc/