[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hacking sablecc grammar



Roger Keays wrote:
> I would like to make some changes to sablecc, particularly the grammar.
> The source however does not include the grammar file which was used to
> generate it's parser (It looks like there has been some modification by
> hand).

Hi Roger.  This is one of those nasty little things that I want to solve
in SableCC 3.  SableCC 2's parser was originally built using SableCC 1,
then it was modified by hand.  There's no nice way to modify SableCC 2's
parser, unless you want to rewrite the front-end to use the kind of AST
generated by SableCC 2 itself (which is effectively a much prettier
AST...).  As I am planning major grammar modifications in SableCC 3
(grammar additions, I should say, as it will be mostly backward
compatible, or else, come with an "upgrade" tool), and as I was short in
time, I decided to postpone this work to SableCC 3.

> What is the simplest way to change the sablecc grammar? I don't really
> want to modify the lexer/nodes by hand, and I don't really want to have to
> modify all the walker classes that generate the corresponding parsers
> classes (the sablecc grammar in sablecc-grammars-2.0.0-src.tar.gz produces
> different nodes).

If you can wait just a little more, you could help me with SableCC 3
(and contribute some input on its design).  Just let me finish my
studies.

FYI: I have been hired by UQAM (Universite du Quebec a Montreal) as an
assistant professor, and SableCC 3 is part of my research plans. :)

> Also, Etienne, I read in one of your previous posts, that you are opposed
> to storing attribute values in the AST (instead preferring to store them
> in a hashtable indexed by the node). You said it was poor style to use
> attributes in nodes. Why?

Mainly maintenance and modularity.  If you compute some analysis
information, and store it directly in the AST, it becomes much harder to
remove this information from the AST later.  One reason you might want
to do this is because you decided that this analysis isn't important to
you anymore (e.g., you initially used your grammar to write a compiler,
and now you only want a pretty-printer for your programs).  Why is it
harder? Because, you might want to get rid of this information
dynamically (if this information is only useful temporarily, and takes
much memory), in which case you have to go through the whole AST,
emptying each node one at a time (and then, you still have to pay the
cost of one null reference node per AST node), whereas, with a hashtable
you simply drop the reference to it, and the garbage collector takes
care of the rest. If you want to do a static modification, then it would
mean modifying evey AST node type to add/remove an analysis, or (if this
was integrated in SableCC's grammar), modify your grammar everytime you
add/remove an analysis.

But, my main argument is "encapsulation".  Analysis information is best,
in my opinion, encapsulated within the analysis itself, instead of
scattered around an AST.  This way, you can add and remove analyses by
simply adding and removing one (or a set of) class(es) from your
project.  You need not modify any other class of your project, except,
maybe, the "main" one.

The "speed overhead" of accessing information through a hashtable is
not, in my opinion, a good argument against it: the choice of Java and
OO for writing a compiler already indicates that a small constant factor
is less important the the compiler designer than good software
engineering (modularity, short development time, and easy maintenance).

Etienne
-- 
----------------------------------------------------------------------
Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                             http://www.sablecc.org/
and SableVM:                                   http://www.sablevm.org/