[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions on grammar and inheritance



Hi Andrew,

> Andrew Cooke <andrew@intertrader.com> wrote:
> First, in writing an interpreter (for the anonymous language
> in Dijkstra's "A Discipline of Programming"), I have a grammar
> where the following structure is common:
>
>  expression_list =
>    {head} expression |
>    {tail} expression comma expression_list ;
>
> i.e. I have syntactic lists which have internal separators,
> but no terminal symbol.  Another example, as code, could be:
>
>  begin
>    statement1 ;
>    statement2
>  end
>
> (note statement2 has no terminating ";").
>
> The grammar pattern above is fine if I am traversing the tree
> "in the normal way", but sometimes I need to run through the
> list of nodes in the list directly.  This produces ugly code
> like:
>
>  PExpressionList list = node.getExpressionList( ) ;
>  while ( ! ( list instanceof AHeadExpressionList ) ) {
>    name = ((ATailExpressionList)list).getExpression( ).getText( ) ;
>    list = ((ATailExpressionList)list).getExpressionList( ) ;
>    { do something with name }
>  }
>  name = ((AHeadExpressionList)list).getExpression( ).getText( ) ;
>  { do the same something with name }
>
> because both head and tail contain an identifier.  This is
> clearly poor code.   But how do I remove that final ugly
> assignment to name?  Using "+" in the grammar would produces a
> nice linked list, but seems to force a final terminator.


This is the reason for the existence of "ignored productions".

You could rewrite your grammar as:

expression_list =
  {temp} expression expression_list_tail* |
  ( expression* );

expression_list_tail =
  comma expression;

Then do a first pass over the AST to change every instance of
ATempExpressionList to AEpressionList.

> Second (I can see this is a long email - but I'm in no hurry
> so if anyone would like to reply when they have more time I
> would still be grateful) it would be elegant to store
> intemediate results, which are associated with particular nodes,
> in the nodes themselves.

>
> Compare this approach with that used in the minibasic example
> where a separate hash array stores intermediate results.  In
> minibasic results are stored by calling setIntValue rather
> that something like node.setTempValue( ... ).  This makes it
> difficult to split the job of interpretation between two separate
> tree walkers, for example.
>
> However, without multiple inheritance, it's hard to see how
> to do this without altering sablecc-generated code.  Any ideas?


There are many reasons to keep the data separated from the nodes. For one
thing, you do not want to go and modify existing node types every time you
add or remove an analysis on the AST.

Secondly, you might only require some data for a short period of time. If
the data is stored in a hashtable, you can get rid of it by simply dropping
the reference to the hastable (the garbage collector will safely do the
rest). There will be no need to visit every node to set some pointer to
null.


> Third, Dijkstra, in his wisdom, decided that the following would
> be acceptable:
>
>  x, y, z = 1, 2, 3 ;
>  x, y = y, x ;
>
> Where print x, y, z would, after executing both lines, give 2, 1, 3.
>
> As he points out in his book, you can easily generate a parser
> that checks the two lists balance (see chunk below), but it
> associates, in this example, x with 3, y with 2 and z with 1.
> In other words, it gets the order reversed.
>
> My simplest solution is to have the parser accept unequal lists
> and then have a run-time error.  Alternatively, I could use a
> grammar like
>
>  assign =
>    var := arg |
>    var , assign , arg ;
>
> And chop the tree around before interpretation to fix the problem
> described above.  But tree surgery is a bit intimidating - so is
> there a simpler solution?


I prefer your "simplest solution". The shape of the AST will be easier to
work with, in that case.

> OK, that's all.  Any comments appreciated, or pointers to good
> discussions (I've read the dragon book, but these questions are
> more about style and efficiency than "doing it in C").


There is no single answer to your questions. I have based the design of
SableCC ASTs on past experience. I have tried to build a system that avoids
past mistakes.

You question about data stored in the nodes, for examples. We had this kind
of structure in the McCAT C compiler. This has proved a problem in the long
run. The difficulty about the design of a compiler is to anticipate the
needs of maintenance. While some structures seem obvious for a short term
strategy, they often prove wrong when comes the maintenance time.


> P.S. Wouldn't it be nice to have this list archived somewhere?


No. It's the privilege of subscribers of knowing what happens on the list...
This might change if our systems admin can find the time to install
Majordomo and some automatic archive script;-)