[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug in SableCC 2.17.2?

I also wanted to add a couple of minor comments regarding character sets and

I had problems with tokens that combined the two. It's been at least a month
and I can't quite remember the scenario but given something simple like:
digits = ['0'..'9'] ;
vowel = 'a' | 'e' | 'i' | 'o' | 'u' ;
message = digit | vowel ;

I had problems with message tokens. At the time it struck me that sets and
lists were not interchangeable which I found rather strange. Although they
are expressed differently, there should be no fundamental difference (that I
can see from superficial thinking.) I wrote vowel as a list because (at the
time) I couldn't figure out how to write it as a set. Of course, one may do
the following (and other variations):
vowel = ['a' + ['e' +['i' + ['o'+'u']]]] ;

Seems silly that the set operator(s) can only handle one operand. Why not
allow: ['a' + 'e' + 'i' + 'o' + 'u'] ? This makes disjoint sets much easier
to define and debug.

As far as the combining of Sets and Lists in a token, I imagine that this
must work and at the time I was just confused. I have been working with
regular expression type grammars and they have a lot of issues with
ambiguious tokens (which is another, long e-mail) so it could simply have
been that the tokens I was defining were ambiguous and I didn't realize it
at the time.


----- Original Message -----
From: "Mariusz Nowostawski" <mariusz@marni.otago.ac.nz>
To: "Xuan Baldauf" <xuan--sable.mcgill.ca@baldauf.org>
Cc: <sablecc-list@sable.mcgill.ca>
Sent: Monday, August 20, 2001 10:36 PM
Subject: Re: Bug in SableCC 2.17.2?

> Hi Xuân,
> You are absolutely right, the current syntax for sets is cumbersome and
> not nice. I belive Etienne is planning for new (major) release of SableCC
> to address it and give much more intuitive and simple syntax.
> The functional difference between list of alternatives and set of
> "characters" is really in the ability to "substract" - one can create
> a bigger set and then declare smaller ones by reducing/substracting
> another set from the bigger set.
> In your case it would be better to use sets as you want to declare
> tokenchars as such characters which are not separators. But as you have
> pointed out the current syntax is not nice at all for that.
> As to the second part of your question, I do not really understand the
> problem (apart again that you have pointed out not-nice error handling and
> not meaningful error message).  Brackets () in sablecc grammar file are
> used to group list of productions/tokens.  However, grouping something
> without a reason is treated as an error ;o)  So, if something does not
> need to be grouped, do not group it ;o)
> I hope you will know what I mean by reading the code below, if you do not
> know how to express something tell me what exactly you want the tail to
> represent and I may help you with that.
> Hth,
> best regards
> Mariusz
> Helpers
>         tokenchar = [[0x0000..0xFFFF]-','];
> Tokens
>         token =       tokenchar+;
>         comma =       ',';
> Productions
>         token_list      =       token token_list_tail*;
> //token_list_tail       = token;         // works
> //token_list_tail       = (token)+;      // works
> //token_list_tail       = comma token;   // works
> //token_list_tail       = (comma token)+;// works