[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SableCC Thoughts

Hi Dan 

I believe that Etienne will have most extended comments and the final
decision on what should go into SableCC, but here are my thoughts, and
once things got approved I am happy to help integrating them into the main

> I need error-recovery.  Specifically, when a Lexer or Parser exception
> occurs, I need to be able to read tokens until EOL is reached, put the
> parser in a known state, and continue parsing.  I've gotten this
> working, but in a very horrible kludgy way.  The 'right way' I think, is
> for there to be a callback which receives an object that lets the user
> manipulate the state of the lexer/parser.  Any thoughts on how to
> programmically tell the parser where it should be in its parse?

Agreed. As said, I think the callback mechanism based on some
event/listener model would be preferable. In such case the internal state
needs to be explicitely accessible.. not sure about details though


> To me SableCC has two main advantages over its competitors: The
> grammar's closeness to BNF and the separation of code and grammar.  I
> think there are a bunch of things that can be done to enhance the
> former.  My grammar often has productions like this:  
> vertical_direction = {up} up | {down} down;
> direction = {up} up | {down} down | {left} left | {right} right;
> It seems to me the names ({up}) are only subtracting from the clarity of
> the statements.  Why not make so names are unncessary when the
> alternative are so simple themselves?  I realize we are trying to
> minimize the impact of grammar changes on the visitors, but if we change
> the alternative 'up' to alternative 'straight-up' then presumably the
> visitor name for alternative Up would need to be changed anyway, so it
> shouldn't make a difference.

Agreed. I had similar feelings before. Maybe the explicit {name} tags
should be left as optional, and by default the name of production being
used? I beliefe there are cases where {name} construct is unescapable.

> I know that it is fairly easy to handle case-insensitive tokens by
> doing:
> Tokens
> full_speed_ahead = f u l l '-' s p e e d '-' a h e a d;
> but this really detracts from grammar readability.  It is also not
> immediately obvious to those uninitiated in SableCC tricks what the heck
> that is.  Why not have a section like:
> Case Insensitive Tokens
> full_speed_ahead = 'full-speed-ahead';
> I see no harm.  Token order (since it is significant) can be in the same
> order as the tokens are in the file, spanning those two sections.

Agreed. I was even thinking of using a global flag for the whole grammar,
as I thought mixing case sensitive with case insensitive tokens is rather
rare, but, well, having seperate sections seems pretty clean to me.

> Lastly (for today) why not have the parser automatically add 'floating
> tokens', like:
> import_declaration = "import" package_name;
> This makes the grammar more readable, easier to write, easier to mantain
> (cause every time you want to add a production with a new token, you
> don't have to add the token to two places).  The only problems are 1)
> how to do case-insensitive floating tokens and 2) Telling the parser
> where the floating tokens should be added relative to the existing
> tokens (since the order of the tokens in the Tokens section matters).
> Both problems seem easily addressed with options in a new Options
> section (which SableCC will inevitably need).  

Not sure about that. It is true that adding tokens into two places is not
that nice, but I am not sure if having two types of tokens, the regular
and floating ones will make things simpler. I like the model that
Helpers/Tokens represent declarations, whereas Productions represent
definitions where all the "atoms" are previously declared. Well, that;s
just my feelings, I am open for discussions ;o)

best regards

Etienne, shall we integrate the updated Java grammar into the source 
tree? If so, just let me know and I will do it.