[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Implementing identification on SableCC ASTs



Hi Etienne,

> The usual approach is to build a tree of symbol tables, e.g.:
> You usually store a pointer from each block of code to its appropriate
> symbol table.
> Looking up a symbo goes recursively from the "current" symbol table
> to its ancestors until a match is found.

Yes, we have a standard symbol table. That's no problem.

Actually, we have a symbol table with a slight twist, as it's not
implemented as you suggest with pointers. Instead it's implemented so
that a) there's no glocal scope, and b) so that lookups for an identifer
in a specific scope is fast.

> You should look at the JJOOS example compiler (see Prof. Hendren's
> cs520 course page http://www.sable.mcgill.ca/~hendren/520/).

Thanks for the reference!

However when browsing the source codes there, I find no new solution to
the problem I stated? (perhaps I haven't discovered it, as I haven't had
time to fully read the source codes)

As far as I can see, he uses a HashMap to associate "scope based" nodes
with a SymbolTable, and another hashmap for storing "annotations" (i.e.
associate a node with a Symbol). This is essentially the same as the
method I suggest last in my original email, namely using the getIn/setIn
for associating nodes with extra attributes.

I see the benefit of having several HashMaps, one for each type of
attributes you want to save. However it's still the same method. You
seperate node attributes from the AST, which I find slightly unnatural
(I'm not sure that applies to other people) -- and (my main objection)
you introduce a certain coupling between the various treewalker classes
that might need these attributes (i.e. they will need to agree on a
specific technicality for retrieving these attributes, they will need to
pass these HashMaps between them, they will need to agree on who
deallocates it again, etc. etc.).

Ofcourse the coupling is inherently there, but it could be lessened by a
standard interface for storing node attributes?

In addition I see a source of problems when you have for example three
layers of tree walkers.

* Identificator
* Desugar
* CodeGenerator

Just an example. Imagine that the identicator walks the tree and stores
attributes on various nodes concerning allocation of
heap/stack/whatever. The CodeGenerator depends on this information to be
available by doing node lookups in a HashMap. Now before the
CodeGenerator is run, the Desuger walks the tree and does some
transformations to make the tree simpler. In the process, it might
"accidently" clone/delete nodes. This destroys the HashMaps that the
CodeGenerator class depends on. I.e. even though you have an "agreement"
(coupling) between CodeGenerator and Identificator, an outsider (the
Desugar) can destroy the system.

Also using HashMaps perhaps isn't the most effective - as you would
normally not need to lookup node attributes by "random access" on
nodes - you would walk the tree and expect to be able to access the node
attributes in the node you visit. That may open up for more efficient
retrieval of attributes?

Also in the source codes you references; the "SymImplementationWalker"
(which is analogous to what I described as an Identificator in my first
email) stores SymbolTables  in an "astToScopeMap". This more or less
destroys seperation of concerns IMHO. The argument that now you not only
have to be aware of scopes in the Identificator, you also have to be
aware of scopes (to a much greater extent) in the CodeGenerator - and
thus having to cut-n-paste code or at least the risk of introducing
subtle bugs is higher.

I would hugely appreciate if you could comment on these considerations,
as they could be extremely mistaken/wrong (I'm new to compiler design).
Thanks for any help you can give in that direction!

--
Jens Kristian Søgaard, jks@cs.auc.dk