[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

use of states




I figured I'd try writing a very tiny parser that I'd probably write by
hand normally to learn a little more about SableCC, after having success
with using it for our rule engine. My new example requires states, though,
and  I'm not quite following the example of states in the thesis,
which is why I think I'm having trouble with what should be a simple
grammar.

I want to write a simple macro language, so it can, for example, match

${java.home}/${java.user}.log

and figure out that ${java.home} and ${java.user} are macro variables, and
the rest of the text is not. The problem is that my token for the name
also matches a regular string, so it wasn't assigning tokens correctly.

To fix this, I declared two states: "normal" and "var." When it's in the
var state, it should only create TIdentifier for the java.home name. When
it's not, it should create a TStringLiteral instead.

Instead, the Lexer is just assigning the whole thing to a
TStringLiteral. What am I doing wrong?

The grammar's short, so it's attached:



Package com.a3sr.text.macro.engine;

Helpers
  ascii_character     = [0..0xff];
  ascii_small         = ['a'..'z'];
  ascii_caps          = ['A'..'Z'];
  unicode_character   = [0..0xffff];

  digit               = ['0'..'9'];
  id_prefix           = ascii_small | ascii_caps | '_';
  id_char             = id_prefix | digit | '.';

  lf  = 0x0a;
  cr  = 0x0d;

  line_terminator     = lf | cr | cr lf;
  input_character     = [ascii_character - [cr + lf]];

States
  normal,
  var;

Tokens
  {normal->var, var} var_prefix = '$';
  {var} start_id = '{';
  {normal, var->normal} end_id = '}';
  {text} string_literal = input_character+;
  {var} identifier = id_prefix id_char*;

Productions
  macro_text = {text} string_literal
             | {var} macro
             ;

  macro = var_prefix start_id identifier end_id;