[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SableCC Thoughts

Dan Sandberg wrote:
> I'm considering making some changes to SableCC and contributing them to
> the source, but I only want to start if I know that the changes will be
> eventually integrated.  Otherwise I am just causing long-term pain for
> me as I continually need to re-integrate features.  So authors (Etienne,
> Ben, Mariusz) please let me know what seems like something you would
> want to integrate back.

Unless your changes would introduced features that conflict with SableCC
goals, I see no reason not to integrate your changes into SableCC (and
save you the pain of perpetual re-integration).

> I need error-recovery.  ...

We discussed this in another thread.  In summary, I think that your
callback idea could be provided, at least temporarily, as a low level
recovery mechanism.

> --
> I've been mucking with the parser stack a lot.  I've noticed that often
> things like X1PRODUCTION and X2PRODUCTION are on the stack, usually
> corresponding to where a + or * is used in the grammar.  If we are going
> to have error-recovery, I think we need to make so that the stack never
> has any nodes on it that wouldn't be there if the parse completed
> succesfully.  Why are these nodes necessary right now?

Because, this is how the input stream is parsed.  The "+*?" operators
are first translated into grammar modifications that introduce these
Xxxx nodes into the grammar.  They are then eliminated after parsing. 
(This is a lie; I do it as soon as they can be eliminated at parse

That's another reason to push towards my two level error recovery
proposal.  Anyway, the question is still open.

> --
> It seems to me the names ({up}) are only subtracting from the clarity of
> the statements.  Why not make so names are unncessary when the
> alternative are so simple themselves?  I realize we are trying to
> minimize the impact of grammar changes on the visitors, but if we change
> the alternative 'up' to alternative 'straight-up' then presumably the
> visitor name for alternative Up would need to be changed anyway, so it
> shouldn't make a difference.

It would probably make sense to provide "simple" default names for
alternatives, and simply issue an error in conflictual cases, much like
we already do for element names.  Using the firt element as name makes
sense.  To make sure we agree, here are examples:


z = 
  b c d |
  e f g;

would result into classes PZ, ABZ extends PZ, and AEZ extends PZ.


z = 
  k b c d |
  k e f g;

would cause SableCC to complain that you are attempting to define two
alternatives witht the same name {k}.  Can be solved with

z = 
  {other_name} k b c d |
  k e f g;


z = 
  {name} k b c d |
  {other} k e f g;

(3)  Productions with single alternatives would continue to get special

z = 
 k b c d;

results in classes PZ and AZ extends PZ

How about it?

> --
> I know that it is fairly easy to handle case-insensitive tokens by
> doing:
> Tokens
> full_speed_ahead = f u l l '-' s p e e d '-' a h e a d;

I thought I had already agreed to have case insensitive tokens.  I would
quote these tokens differently, something like

'hello' -> case sentitive
"hello" -> case insensitive

Remains the problem of non-ascii characters.  e.g. shouldn't "" be
equivalent to ('' | '')?  We could possibly accomodate a "case
insensitivity" specification section.

> Case Insensitive Tokens
> full_speed_ahead = 'full-speed-ahead';

I prefer a special notation, instead of additional section, as it
affects the order of declarations.

> --
> Lastly (for today) why not have the parser automatically add 'floating
> tokens', like:

I have many reserves on this.  I prefer to keep a separate token
declaration section, as it leaves much more flexibility to adding
features into SableCC specifications. 

Etienne M. Gagnon, M.Sc.                     e-mail: egagnon@j-meg.com
Author of SableCC:                             http://www.sablecc.org/
and SableVM:                                   http://www.sablevm.org/