[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AST transformations

To: sablecc-list <sablecc-list@sable.mcgill.ca>
Subject: Re: AST transformations
From: Hendrik Boom <hendrik@topoi.cam.org>
Date: Mon, 28 Feb 2000 20:58:36 -0500 (EST)
In-Reply-To: <38B81D4A.A7BF8430@j-meg.com> from "Etienne M. Gagnon" at "Feb 26,2000 01:36:58 pm"
Sender: owner-sablecc-list@sable.mcgill.ca
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Hi Hendrick.
> 
> Hendrik Boom wrote:
> >...
> > There are two kinds I am considering...
> > (1) macros:
> >...
> > (2) tree replacements
> >...
> 
> This is definitely more powerful than the SableCC 3 proposed scheme.  But it
> also suffers from a few problems, in my opinion:
> 
> (1) It is still (or at least seems) more limited than a post-parsing AST
> transformer.

(1) is a sop to users who are used to macro-procesors.  If you can recognise
a macro syntactically (and you have some syntactic context available if you
need it), you can replace it by new source code, which can fail syntactically.

> (2) Pushing back terminals/nonterminals on the input stream doesn't seem that
> intuitive from both the compiler writer and compiler user points of view.

It is the method used in the so-called non-canonical SLR parsers (NSLR).
In certain inadequate states, they merge the lookahead sets (which are
modified to include nonterminals) ane use them to start parsing past the
inadequacy (without yet resolving it).  If, during parsing, further reductions
result in a nonterminal appearing (in the input!) which *does* resolve the
inadequacy, we go back and reduce or whatever.  A parser-generation-time
test exists to identify when this technique may be useful.
Those grammars that pass the test are the NSLR grammars.
The resulting parsers do not reduce their production in LR-stype canonical
order, hence the name "non-canonical).

Anyway, given this context, it does seem natural to push things into the input
stream, since the parser may already be doing it anyway.

If you extend this to having productions that can (using "semantic actions")
deynamically determine which of several alternative nonterminals
they can reduce to (i.e., multiple left-hand sides), it is possible to
embed lexical scanning within the parser and dispense with a separate
lexical scanner (though I have encountered significant unpleasant
overhead when I did this).
 
> e.g. You can have a perfectly "parsable" grammar, but your replacement rules
> can cause parsing error!  The potential new errors could be undetected by the
> compiler generator (unless you add restrictions on replacements), and thus the
> end user might get a parse error on a seemingly valid program (if there is an
> error in a rewriting rule).

Which is why I abhor the rewriting convention I label (1), although things like
it are useful to handle things like syntactically-constrained include
statements.

Which is why in method (2) I would have the parser generator parse the
replacement text at parser-generation time to ensure that the replacement
text *is* syntactically valid in the same context.  The so-called
semantic-action associated with the rule can just copy a prebuilt
piece of parse tree and fill in the blanks.
Not all rewritings can be done in this way during parse time -- the most
notable exceptions are those where the context-free part of the rule
(withiut the replacement) turn out to cause ambiguities.
But in this case you can still process these rules with a post-parse tree-rewriting system.

> 
> I personally prefer to implement the simple-to-understand, deterministic,
> terminating, linear-time, strongly-typed, bottom-up and non-reordering scheme
> I proposed at "parse time".  This will be sufficient to simplify the AST of
> most programs, potentially lowering significantly the parse-time memory
> requirement for AST storage.
	
I agree.

> Then, a full blown post-parse AST rewriting
> system (_ la Prof. Jeff Baalen) would take care of other more complex AST
> transformations.  Ideally, this rewriting system would also be strongly-typed,
> thus saving much headaches to compiler writers.

I agree.  I no longer think there *is* such a thing a untyped data -- there
are just good and bad type systems.  Or, if you prefer, useful and useless
ones.

> 
> I think that it is more important to provide simple to understand, and pifall
> catching (e.g. strongly-typed) systems that will save amateur (and
> professional) compiler writers from writing potentially incorrect
> specifications, than to provide "the most powerful" system that allows you
> (and almost encourages you, sometimes) to shoot yourself in the foot.  

I agree.

> 
> The perfect example of this is my choice of not supporting YACC/Bison like
> conflict resolution mechanisms in SableCC.  Amateur, and even professional
> compiler writers often use these mechanisms, as it is usually much easier to
> simply use them than rewrite the grammar.  But, I can tell you that even
> experienced professors get caught introducing parsing anomalities without
> being aware of it.  

I agree.  I suspect that C++ shot itself in the foot many times over
like this.

> It happened last year when I did rewrite a Bison grammar to SableCC.  This
> grammar was written by a professor for his students course project.  Some
> students wanted to use SableCC.  SableCC detected the grammar conflicts, and
> forced me to understand them to be able to rewrite the grammar.  This allowed
> me to identify that the grammar was inherently ambiguguous.  The ambiguity was
> "resolved" arbitrarily in the Bison version (shift, I think).  The consequence
> of the ambiguity was important from a "compiler user" point of view:  Assuming
> the language specification, a programmer would write a valid program expecting
> specific semantics.  The arbitrary resolution would cause the parser to give
> an error message on this valid program.  Imagine the disarray of the poor
> programmer that can't begin understand what's wrong with his program (there's
> nothing wrong... It's just the parser that is not parsing the program
> correctly!).
> 
> Some people disagree with me on this.

I don't disagree with you. With such people, though, I grudgingly concede
that there are sometimes syntactic ambiguities with no semantic consequences.

> This is fine!  There is enough place in
> our world for many parser generators.  But, I'd like to keep the "protect
> people from common pitfalls" philosophy in SableCC.  I think that it is one of
> its strong selling points, even though it may force you to think longer about
> your problem, initially;-)
> 
> I still think that your proposed system is a nice and powerful system, but
> maybe not appropriate for SableCC.
> 
> Thanks Hendrick for sharing your always excellent ideas.


> 
> Etienne
> -- 
> ----------------------------------------------------------------------
> Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
> Author of SableCC:                 http://www.sable.mcgill.ca/sablecc/
> ----------------------------------------------------------------------
> 
-- hendrik
References:
- Re: AST transformations
  - From: "Etienne M. Gagnon" <egagnon@j-meg.com>
Prev by Date: Re: AST transformations
Next by Date: Progress Information
Prev by thread: Re: AST transformations
Next by thread: Re: AST transformations
Index(es):
- Date
- Thread