[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AST transformations



Hi Jeff!

OK.  I get the general idea.  I do like your idea a lot.  It seems quite
general.  

If I understand well (tell me if I off the track!), you just need to type
transformation rules then some engine applies these transformations on the AST
until no more rules match.

If I am right, I think that we need to design some refinement  before you jump
to your code editor and start coding it;-)

One problem in your example,

[initial grammar, for reference:

statement =
  ...  
  {AST: mult_assign}   id* assign num* semicolon) |
  {AST: single_assign} id num |
  {AST: block} statement*;  // just for the sake of demontration
]

> rule [nicer_assign]
> mult_assign(id*(id(?X),$Rest),assign,num*(num(?N),$Rest2))
>    =
> single_assign(id(?X),assign,num(?N)),mult_assign($Rest1,assign,$Rest2)
> 
> rule [nicer_assign_end]
> mult_assign(id*(),assign,num*()) = empty

is that you do not know for sure that the number of "id"s in the id* list is
equal to the number of "num"s in the num* list.  Another problem is that you
are making a single "assign" token the child of many alternatives, which is
not possible in SableCC ASTs.  (SableCC enforces the "tree" property of an
AST).

But, if we look at your rules with a "if(match) then transform" view, and we
simply dump the "assign" token, we might write:

// Rule that matches a MutlAssignStatement with one (or more) ids and one (or
more) num.

Rule: In statement
mult_assign((first_id,rest_id*),assign,(first_num, rest_num*),semicolon)
->
block((single_assign(first_id,first_num),
mult_assign(rest_id,assign,rest_num,semicolon)))

This means:

replace a mult_assign statement with a block statement.  The id* of
mult_assign should have at least one id, which we name "first" (using a user
provided identifier).  The same for num*.

But this is unfortunate:  we are creating nested block statements, instead of
a single one (with many single_assign sub statements)!

One solution would be to flatten nested block statements.  For this we need a
more powerful syntax to match specific production alternatives in children
nodes.

Attempt at a solution:

Rule: In statement
block(head_statement*,{block}block_stmt(stmt*),tail_statement*)
->
block(head_statement*,stmt*,tail_statement*)

The problem here is that  we are matching one of possibly many block
substatement.  This is ambiguous.  We could try to provide a syntax to specify
this unambiguously.  (The tool should also reject any ambiguous
specification).

// I used "!" to indicate "not".  I want to match the first block
substatement.
Rule: In statement
block({!block}head_statement*,{block}block_stmt(stmt*),tail_statement*)
->
block(head_statement*,stmt*,tail_statement*)


One exercise would be to also try converting an LL(1) like expression grammar
(that ignores precedence) into the right AST based on precedence, using
whatever AST transformation syntax you design.

Something like

// LL
exp =
  term exp_tail |
  l_par exp r_par;

exp_tail =
  plus exp |
  mult exp;

// AST
exp =
  exp plus exp |
  exp mutl exp;

But the AST should encode the correct precedence.  
So if you feed: 2 + 3 * 4 + 5 to the LL grammar, 
you still want the final AST to be (2 + (3 * 4)) + 5.


It is not an easy problem, but I am sure it is possible to come up with a
"general enough" AST transformation system.  If you want to go for a "rule
based" system, it should also have a predictable output.  This means that
either:
(1) the order in which rules are applied is of no importance, as the final AST
would be the same, or
(2) [this is usually the case] the order of rules is important.  In which
case, the behavior of the transformtation engine should be clearly defined so
that transformations can be expressed deterministically.

A system like (1) would be nice, but you may have to restrict kind of rules
and/or combination of rules to achieve safety.  If this is too limiting, you
will have to make sure the behavior can be clearly specified.  It should also
be easy to reason about transformations.

If you are a (graduate?) student, it might be a good idea to ask your
supervisor for references in the literature about similar systems. I am not a
specialist in this field, but I am sure we are not the first people thinking
about rule systems and tree transformation systems.  It would be a good idea
not to reinvent the wheel, but to improve it (and/or adapt it) in our current
context.

Etienne

-- 
----------------------------------------------------------------------
Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                 http://www.sable.mcgill.ca/sablecc/
----------------------------------------------------------------------