[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SableCC Thoughts

Hi Everyone.  I've been using SableCC for a while and have come up with
a whole lot of questions, thoughts, and suggestions.  I've been away
from a decent internet connection and so they have queued up a  bit. 
I'll post them slowly over the next week so as not to overwhelm you :)

I'm considering making some changes to SableCC and contributing them to
the source, but I only want to start if I know that the changes will be
eventually integrated.  Otherwise I am just causing long-term pain for
me as I continually need to re-integrate features.  So authors (Etienne,
Ben, Mariusz) please let me know what seems like something you would
want to integrate back.


I need error-recovery.  Specifically, when a Lexer or Parser exception
occurs, I need to be able to read tokens until EOL is reached, put the
parser in a known state, and continue parsing.  I've gotten this
working, but in a very horrible kludgy way.  The 'right way' I think, is
for there to be a callback which receives an object that lets the user
manipulate the state of the lexer/parser.  Any thoughts on how to
programmically tell the parser where it should be in its parse?


I've been mucking with the parser stack a lot.  I've noticed that often
things like X1PRODUCTION and X2PRODUCTION are on the stack, usually
corresponding to where a + or * is used in the grammar.  If we are going
to have error-recovery, I think we need to make so that the stack never
has any nodes on it that wouldn't be there if the parse completed
succesfully.  Why are these nodes necessary right now?


To me SableCC has two main advantages over its competitors: The
grammar's closeness to BNF and the separation of code and grammar.  I
think there are a bunch of things that can be done to enhance the
former.  My grammar often has productions like this:  

vertical_direction = {up} up | {down} down;
direction = {up} up | {down} down | {left} left | {right} right;

It seems to me the names ({up}) are only subtracting from the clarity of
the statements.  Why not make so names are unncessary when the
alternative are so simple themselves?  I realize we are trying to
minimize the impact of grammar changes on the visitors, but if we change
the alternative 'up' to alternative 'straight-up' then presumably the
visitor name for alternative Up would need to be changed anyway, so it
shouldn't make a difference.


I know that it is fairly easy to handle case-insensitive tokens by
full_speed_ahead = f u l l '-' s p e e d '-' a h e a d;

but this really detracts from grammar readability.  It is also not
immediately obvious to those uninitiated in SableCC tricks what the heck
that is.  Why not have a section like:

Case Insensitive Tokens
full_speed_ahead = 'full-speed-ahead';

I see no harm.  Token order (since it is significant) can be in the same
order as the tokens are in the file, spanning those two sections.

Lastly (for today) why not have the parser automatically add 'floating
tokens', like:

import_declaration = "import" package_name;

This makes the grammar more readable, easier to write, easier to mantain
(cause every time you want to add a production with a new token, you
don't have to add the token to two places).  The only problems are 1)
how to do case-insensitive floating tokens and 2) Telling the parser
where the floating tokens should be added relative to the existing
tokens (since the order of the tokens in the Tokens section matters).

Both problems seem easily addressed with options in a new Options
section (which SableCC will inevitably need).  


Thanks for any feedback!


Dan Sandberg
Silicon Poetry, Inc.
Phone : 650.493.5282
Mobile: 650.814.4931
Fax   : 253.540.6798