[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hacking sablecc grammar

Hi Etienne,

Thanks for your reply...

> > What is the simplest way to change the sablecc grammar? I don't really
> > want to modify the lexer/nodes by hand, and I don't really want to have to
> > modify all the walker classes that generate the corresponding parsers
> > classes (the sablecc grammar in sablecc-grammars-2.0.0-src.tar.gz produces
> > different nodes).
> If you can wait just a little more, you could help me with SableCC 3
> (and contribute some input on its design).  Just let me finish my
> studies.

Sure, I'd love to help... Just let me finish my holidays ;)

> FYI: I have been hired by UQAM (Universite du Quebec a Montreal) as an
> assistant professor, and SableCC 3 is part of my research plans. :)
> > Also, Etienne, I read in one of your previous posts, that you are opposed
> > to storing attribute values in the AST (instead preferring to store them
> > in a hashtable indexed by the node). You said it was poor style to use
> > attributes in nodes. Why?
> Mainly maintenance and modularity.  If you compute some analysis
> information, and store it directly in the AST, it becomes much harder to
> remove this information from the AST later.  One reason you might want
> to do this is because you decided that this analysis isn't important to
> you anymore (e.g., you initially used your grammar to write a compiler,
> and now you only want a pretty-printer for your programs).  Why is it

This is why I decided to use SableCC... you can easily generate different
output by changing walker classes (exactly what I was after).

> harder? Because, you might want to get rid of this information
> dynamically (if this information is only useful temporarily, and takes
> much memory), in which case you have to go through the whole AST,
> emptying each node one at a time (and then, you still have to pay the
> cost of one null reference node per AST node), whereas, with a hashtable
> you simply drop the reference to it, and the garbage collector takes
> care of the rest. If you want to do a static modification, then it would
> mean modifying evey AST node type to add/remove an analysis, or (if this
> was integrated in SableCC's grammar), modify your grammar everytime you
> add/remove an analysis.

What I had in mind was to add simple attribute capabilities to allow the
definition of more flexible tree nodes. My basic (very early) idea is to
allow the definition of arbitrary attributes in the grammar. It would make
most sense to have these in an ignored alternative. Something like:

   model = {parsed} ident lbrace type shape phase rbrace |
           ( [name]:<String> type shape phase );

My only real motivation for this idea is to make it more elegant to
get/set information from the AST. So instead of

   String foobar = someModel.getIdent().getText();

you could just write

   String foobar = someModel.getName();

Pretty trivial, I know. Also, I have read your arguments for using a
wrapper method getName(node) to lookup a hashtable. I guess one of the
negatives about allowing arbitrary attributes is that it may require an
appropriate import statement.

(ASIDE: My understanding of hashtables is that empty entries still use
memory - i.e. the table uses a contiguous areas of memory. In this case, a
null node reference would probably use less memory)

> But, my main argument is "encapsulation".  Analysis information is best,
> in my opinion, encapsulated within the analysis itself, instead of
> scattered around an AST.  This way, you can add and remove analyses by
> simply adding and removing one (or a set of) class(es) from your
> project.  You need not modify any other class of your project, except,
> maybe, the "main" one.

Yes, I see your point.

> The "speed overhead" of accessing information through a hashtable is
> not, in my opinion, a good argument against it: the choice of Java and
> OO for writing a compiler already indicates that a small constant factor
> is less important the the compiler designer than good software
> engineering (modularity, short development time, and easy maintenance).

On a separate note, I am also wondering about the decision to implement
the ? EBNF operator by creating extra productions, rather than using an
`X | empty' approach. A production with a few ?'s in it's rule quickly
becomes very slow to generate a parser for (due to all the possible
permutations). No real problem I guess - alternatively we could write our
grammars using only BNF (yuk!).

Best of luck at UQAM.

Kind regards,