From: Thomas Leonhardt [mailto:firstname.lastname@example.org]
Sent: 24 August 2001 13:35
Subject: AW: Some questions...
> 1) I need to preserve as much of the original source formatting (and
> comments) as possible. My intention is to write a custom
> Lexer that builds
> a hash to record the ignored tokens (white space, doc
> comments, eol comments
> and normal comments) that occur before each non-ignored
> token. Then when I
> output the parsed tokens (in most cases the code should just
> pass through
> the pre-processor), I can just print out the ignored tokens first to
> recreate the output. Does this sound feasible? I thought it
> might be a
> little slow...
We created a pretty printer with sablecc which foramts the
cource code according to our style guide. It's implemented as you
describe it. It isn't slow and was quite easy to implement with sablecc.
[Jim] I'm glad I'm on the right track with the lexer at least.
> But why is
> there these two versions; In and Out? I must be missing
> something. I get
> the In/Out thing with the methods called during the tree
> walk, but I don't
> see how this relates to attributes. Do you store the
> attribute info you
> generate with the in* methods with (set|get)In and the
> attribute info you
> find in the out* methods in (set|get)Out? This seems to make sense
> somewhat, but why keep them in separate hashes?
The in method is use if you walk into a sub tree. At this point you only have
information that you collected in the upper tree nodes. The out method
is called after you completle processed a sub-tree.
On use of these methods is if you need flags which tell the visitor to behave
different for certian sub-trees. Say you got a flag bInInnerClass and set
it to true when the in method is called. Than you're
able to layout inner classes differently from "normal" classes. After
you leave the sub-tree you set the flag to false again. And the rest of the
tree will be handled normaly.
[Jim] I understand when the in and out methods are called, I was talking about specifically setIn/getIn/setOut/getOut - it just seems a little odd to have two separate hashes for each node. I am presuming that you are supposed to store attributes gathered during the execution of an inXXX method using setIn and the attributes calculated in the outXXX method using setOut...
> 4) I'm going to have something of a problem with line
> numbers when I pass
> the post-processed text to a compiler (it must also be human
> like javac. Is there any java equivalent to the #line
> directive in C? I
> think not, so I'm preparing to use another hash to map line
> numbers back to
> the original source. Any bright ideas?
There is no such thing. But sablecc provides the information you need.
To top level nodes have information about their location in a file.
It's a bit tricky to get it but it works.
[Jim] Yes, I can see I'm just going to have to do it the hard(ish) way.
[Jim] Thanks for all that info. Another thing I am not sure about is the TypedLinkedList thing and all these funny cast classes. It looks like some established technique for doing collections that has passed me by, can somebody give me a pointer to something on it? I looked at the code but had a hard time seeing what was going on (so it's a bit embarassing to have to ask, but time is of the essence).
Many thanks to all,
Jim Moores, Director, Quickstone Technologies Limited, UK.