[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AST transformations



Hi Jeff.

I already have designed the grammar of SableCC 3.0 (a few months ago).  SableCC 3.0 will have a very nice mechanism to do parse-time AST transformations.  The problem is that, right now, my priority #1 is my Ph.D. thesis, which is NOT about SableCC.  So, I do not yet have a time schedule to
implement it.  This mechanism has some limitations, which could make your tool a nice complement, if it could provide the missing functionality "post-parsing".

To give you an idea, in SableCC 3 you would write the "combined" grammar of the previous message as follows:

Tokens
  plus = '+';
  mult = '*';
  num = ['0'..'9']+;

Productions

exp =
   {tmpplus} exp plus factor -> 
     plus(exp, plus, factor.exp) |
   
   {factor}  factor -> factor.exp |

   {AST: plus} [lexp]:exp plus [rexp]:exp |
   {AST: mult} [lexp]:exp mult [rexp]:exp |
   {AST: num}  num;

factor -> (exp) =
   {mult} factor mult num -> 
     (mult(factor.exp, mult, num(num))) |

   {num}  num -> (num(num));

Syntax explanation:

-> plus(exp, plus, factor.exp) 
means: transform into a PlusExp with children exp, plus, and the "exp" of a factor.

-> factor.exp
means: replace by the exp of a factor

factor -> (exp) =
means: every factor will result into an exp.

-> (num(num))
means: transform into a NumExp with child NumFactor.num.

I hope you get the idea.

This syntax is pretty powerful. In fact, you can also simplify things like comma separated id lists. e.g.:

var_decl =
  {temp} type id_list -> (type, id_list.id) |

  {AST:} type id*;

id_list -> (id*) =
  id id_list_tail* -> (id + id_list_tail.id);

id_list_tail -> (id) =
  comma id -> (id);

You will note that when a list is expected (* and/or +), you can use the "+" operator to combine single elements and lists of elements into a single list.  SableCC knows if each argument of the "+" is an element or a list from the context.  It would be possible to write:

id_list_tail -> (id*) =
  comma id -> (id);

SableCC would construct a "one element" list.  

"id id_list_tail* -> (id + id_list_tail.id)" would still work, resulting into a single "flat" list, NOT a list of lists.

A more complex example involves parsing "multiple assignments" using a grammar that enforces a matching argument count, then transform the AST into a list to list assignment.

(e.g.: a,b,c = 1,2,3;)

original grammar:

statement =
  parse_mult_assign semicolon;

parse_mult_assign =
  {single}   id assign num |
  {multiple} id [l]:comma parse_mult_assign [r]:comma num;

AST grammar:

statement =
  {mult_assign} id* assign num* semicolon;

Solution:

statement =
  {parse_mult_assign} parse_mult_assign semicolon -> 
    mult_assign(
      parse_mult_assign.id, 
      parse_mult_assign.assign, 
      parse_mult_assign.num,
      semicolon) |

  {AST: mult_assign} id* assign num* semicolon;

parse_mult_assign -> (id*, assign, num*) =
  {single}   id assign num -> (id, assign, num) |
  {multiple} id comma parse_mult_assign comma num -> 
    (id + parse_mult_assign.id, 
     parse_mult_assign.assign, 
     parse_mult_assign.num + num);

Obviously, this AST transformation mechanism has limitations:
(1) it only works "bottom-up"
(2) it does not allow reordering if nodes (also enforced by SableCC in case a programmer tries to express reordering).

The advantage is that it can be implemented transparently at "parse" time, which can result in lowering memory consumption.

Your tool would be very interesting if it could express cleanly more complex "post-parse" AST transformations.  One would think of transforming the last example AST into:

statement =
  {nicer_mult_assign} single_assign+;

single_assign =
  id num;  // SableCC ASTs enforce the "tree" property
           // which means we either create new "assign"
           // tokens, or we save memory by not requiring
           // this spurious token.

If you think carefully about it, now tokens are in a different order than they originally were e.g.:

a,b,c = 1,2,3  => 
[a b c] = [1 2 3]  =>
[a 1] [b 2] [c 3]

This kind of transformation is not expressible using a SableCC 3 grammar.  But this would be a very nice to express into a post-parse AST tranformation tool.

What do you think?

WARNING FOR SABLECC-LIST SUBSCRIBERS:  Development of SableCC 3 is ON HOLD until the end of my Ph.D. studies.  Please don't ask me "when will SableCC 3 be released?".  I am only discussing design here.

Etienne
-- 
----------------------------------------------------------------------
Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                 http://www.sable.mcgill.ca/sablecc/
----------------------------------------------------------------------