[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sablecc grammar puzzle



Hello,

This is my first post to the list.  I've been using SableCC in
production for over a year now, and love it.  Thank you, Etienne, for
SableCC's elegance.  (Thanks to all other contributors as well!)

I'm not a grammar expert, though, and I have a specific grammar puzzle
that I would like some help with.  I don't know if the problem that I
have is solvable by some transformation of my grammar, or if my
language is beyond the reach of LALR(1).

My problem-domain is a small expression language.  In my language, you
can write "boolean expressions", and "value expressions" (the latter
able to represent numeric or string values...sorry about my
nomenclature.)  Here are some example expressions:

    a + 3                              // a value expression
    (baz(12) + baz(13)) * 2            // a value expression
    (12 > q OR 12 > p) AND isFunky(q)  // a boolean expression
    !isFunky(q)                        // a boolean expression
    !(12 > q OR 12 > p)                // a boolean expression
    foo(x * 3, bar(y))                 // (?)

Things to notice from these examples:

  0)  The variables never have boolean values...they always have
      a numeric or string value.  The same is true for the argument
      lists for functions:  they must always be "value subexpressions",
      never boolean.

  1)  "value expressions" can be subexpressions of "boolean expressions",
      joined by some comparison operator (>, >=, ...)

  2)  The parentheses serve multiple purposes:
      2a) grouping "value terms" in a value expression
      2b) grouping "boolean terms" in boolean expressions
      2c) grouping argument-lists for functions

  3)  identifiers that are mere variables are distinguished
      from function identifiers by the fact that the latter
      have an argument-list in parenthesis appended to the
      identifier.  In other words, "foo" is a variable, but
      "foo(3)" is a function.

  4)  Functions can be either "boolean" or "value" depending or
      their context.  If they appear as a boolean-factor of a
      boolean-term of a boolean-expression, then they are boolean
      functions, etc.

  5)  Note that, because of this, the final example expression
      above is ambiguous.  Is foo() a "boolean function" or a
      "value function"?


Observation #5 is where my problem begins.  I have been using an ugly
workaround to this problem: I have special prefixes for "boolean
function names" that indicate their "boolean-ness": is, are, has,...

...but doing this is causing a second problem:  I can't have any
regular variables that start with any of these special prefixes.
In other words, if I introduce "can" as a new boolean prefix into my 
language, I can have a boolean function canDance(), but I can no
longer have a variable called "cannon"!

So I would like advice on two separate issues:

   A)  How can I write my grammar so that the special boolean-function
       prefixes are no longer necessary?  In my application i always
       know which kind of expression I want, so if I have an expression
       like the final example above...

       foo(x * 3, bar(y))                 // (?)

       ...in a boolean context, I could assert, at evaluation-time,
       that we must find a boolean function called foo().

       Or, if this same expression is used in a "value context",
       I could assert that a value-function called foo() is used.

       A solution to this problem is the most I could hope for.

   B)  Failing that, the "consolation prize" that I seek is that if
       I must live with the special prefixes for boolean function names,
       how can I transform my grammar so that these prefixes no longer
       interfere with the names of ordinary variables?


Since SableCC grammars are already small and easy to read, I'm including
my entire grammar.  Let me know if trying to strip this down to a minimal
example grammar would be helpful...





/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 *                                                                 *
 * g expression grammar                                            *
 * See http://www.sablecc.org/ for the parser-generator            *
 *                                                                 *
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

Package express;

/*******************************************************************
 * Helpers                                                         *
 *******************************************************************/
Helpers
   unicode_input_character = [0..0xffff];
   ht  = 0x0009;
   lf  = 0x000a;
   ff  = 0x000c;
   cr  = 0x000d;
   sp  = ' ';
   dot = '.';
   line_terminator = lf | cr | cr lf;
   non_zero_digit = ['1'..'9'];
   digit = ['0'..'9'];
   letter = ['a'..'z'] | ['A'..'Z'];
   letter_or_digit = letter | digit | '$' | '_';
   a = ('a'|'A');
   d = ('d'|'D');
   e = ('e'|'E');
   h = ('h'|'H');  
   i = ('i'|'I');
   l = ('l'|'L');
   n = ('n'|'N');
   o = ('o'|'O');
   r = ('r'|'R');
   s = ('s'|'S');
   t = ('t'|'T');
   u = ('u'|'U');
   w = ('w'|'W');
   boolean_function_prefix = n o t | i s | a r e | h a s | s t a r t s | d o e s | e n d s | w a s;
   single_quote = ''';
   double_quote = '"';
   input_character = [unicode_input_character - [cr + lf]];
   string_character = [input_character - [double_quote + single_quote]];

/*******************************************************************
 * Tokens                                                          *
 *******************************************************************/
Tokens


   white_space = (sp | ht | ff | line_terminator)*;
   null = n u l l;
   logical_complement = '!';
   left_parenthesis = '(';
   right_parenthesis = ')';
   lt = '<'|'lt';
   gt = '>'|'gt';
   eq = '=='|'eq';
   lteq = '<='|'le';
   gteq ='>='|'ge';
   neq = '!='|'ne';
   and = '&&'| a n d;
   or = '||'| o r;
   plus = '+';
   minus = '-';
   star = '*';
   div = '/';
   mod = '%';
   comma = ',';
   boolean_function_name = boolean_function_prefix letter_or_digit*;
   identifier = letter letter_or_digit* ( dot letter letter_or_digit*)*;
   decimal_numeral = '0' | non_zero_digit digit*;
   string_literal = single_quote string_character* single_quote |
                    double_quote string_character* double_quote;	

/*******************************************************************
 * Ignored Tokens                                                  *
 *******************************************************************/
Ignored Tokens

   white_space;

/*******************************************************************
 * Productions                                                     *
 *******************************************************************/
Productions

goal =
   {global_expression} expression;

expression = 
   {boolean_expression} boolean_expression |
   {value_expression} value_expression;

/*******************************************************************
 * Productions for boolean expressions                             *
 *******************************************************************/

boolean_expression = boolean_term boolean_term_tail*;

boolean_term_tail = or boolean_term;

boolean_term = boolean_factor boolean_factor_tail*;

boolean_factor_tail = and boolean_factor;

boolean_factor = 
   {atomic_comparison} atomic_comparison |
   {grouped_expression} left_parenthesis boolean_expression right_parenthesis |
   {not_grouped_expression} logical_complement left_parenthesis boolean_expression right_parenthesis;

atomic_comparison = 
   {boolean_function} boolean_function |
   {not_boolean_function} logical_complement boolean_function |
   {eq_comparison} [left]:value_expression eq [right]:value_expression |
   {neq_comparison} [left]:value_expression neq [right]:value_expression |
   {lt_comparison} [left]:value_expression lt [right]:value_expression |
   {gt_comparison} [left]:value_expression gt [right]:value_expression |
   {lteq_comparison} [left]:value_expression lteq [right]:value_expression |
   {gteq_comparison} [left]:value_expression gteq [right]:value_expression;

boolean_function =
   {generic} boolean_function_name left_parenthesis argument? right_parenthesis;


/*******************************************************************
 * Productions for value expressions                               *
 *******************************************************************/

value_expression = 
    {null} null |
    {non_null} value_term value_term_tail*;

value_term_tail = 
   {subtraction} minus value_term |
   {addition} plus value_term;

value_term = value_factor value_factor_tail*;

value_factor_tail = 
   {modulus} mod value_factor |
   {multiplication} star value_factor |
   {division} div value_factor;

value_factor =
   {atomic_factor} atomic_factor |
   {grouped_factor} left_parenthesis value_expression right_parenthesis;

atomic_factor =
   {function} identifier left_parenthesis argument? right_parenthesis |
   {identifier} identifier | 
   {decimal_integer_literal} decimal_integer_literal |
   {string_literal} string_literal;

argument = value_expression argument_tail*;

argument_tail = comma value_expression;

/********************************************************************
 * Literals                                                         *
 ********************************************************************/

decimal_integer_literal =
   decimal_numeral;