> 1) I need to preserve as much of the original source formatting (and
> comments) as possible. My intention is to write a custom
> Lexer that builds
> a hash to record the ignored tokens (white space, doc
> comments, eol comments
> and normal comments) that occur before each non-ignored
> token. Then when I
> output the parsed tokens (in most cases the code should just
> pass through
> the pre-processor), I can just print out the ignored tokens first to
> recreate the output. Does this sound feasible? I thought it
> might be a
> little slow...
We created a pretty printer with sablecc which foramts the
cource code according to our style guide. It's implemented as you
describe it. It isn't slow and was quite easy to implement with sablecc.
> But why is
> there these two versions; In and Out? I must be missing
> something. I get
> the In/Out thing with the methods called during the tree
> walk, but I don't
> see how this relates to attributes. Do you store the
> attribute info you
> generate with the in* methods with (set|get)In and the
> attribute info you
> find in the out* methods in (set|get)Out? This seems to make sense
> somewhat, but why keep them in separate hashes?
The in method is use if you walk into a sub tree. At this point you only have
information that you collected in the upper tree nodes. The out method
is called after you completle processed a sub-tree.
On use of these methods is if you need flags which tell the visitor to behave
different for certian sub-trees. Say you got a flag bInInnerClass and set
it to true when the in method is called. Than you're
able to layout inner classes differently from "normal" classes. After
you leave the sub-tree you set the flag to false again. And the rest of the
tree will be handled normaly.
> 4) I'm going to have something of a problem with line
> numbers when I pass
> the post-processed text to a compiler (it must also be human
> like javac. Is there any java equivalent to the #line
> directive in C? I
> think not, so I'm preparing to use another hash to map line
> numbers back to
> the original source. Any bright ideas?
There is no such thing. But sablecc provides the information you need.
To top level nodes have information about their location in a file.
It's a bit tricky to get it but it works.