[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: use of states




Etienne,

Thanks for the help! For everyone else's benefit, I'm including the
working Lexer and grammar for a simple macro language that accepts inputs
like the following:

- logs [matches text fragment]
- logs$$ this should be fine $ okay? [matches a mix of text and "stray
dollar" fragment nodes]
- ${java.home} [matches macro]
- ${java.home}/${java.home}.log [matches a mix of macros and text
fragments]

Given this, it's very easy to build an elementary a macro processor.
As Etienne explained to me yesterday, the catch is that you need to build
a custom lexer to make up for the lack of lookahead in SableCC, so it can
tell the difference between "show me the $$$" (stray dollars) and
"show me the ${money}" (macro). Here's the Lexer:

package com.a3sr.text.macro;

import java.io.IOException;
import java.io.PushbackReader;
import com.a3sr.text.macro.engine.lexer.Lexer;
import com.a3sr.text.macro.engine.node.TVarPrefix;

/**
 * Custom Lexer that handles the case where someone has "$" in the
 * input text that's not part of a macro declaration.
 *
 * @author <a href="mailto:kdowney@amberarcher.com">Kyle F. Downey</a>
 * @version $Revision: 1.1 $
 */
public class MacroLexer extends Lexer {
    public MacroLexer(PushbackReader in) {
        super(in);
    }

    protected void filter() {
        if (token instanceof TVarPrefix) {
            // pushback everything except the leading "$"
            try {
                unread(new TVarPrefix(token.getText().substring(1),
                                      token.getLine(), token.getPos() +
1));
            } catch (IOException e) {
                e.printStackTrace();
            }

            token = (TVarPrefix)token.clone();
            token.setText("$");
        }
    }
}

and here's the grammar. This is essentially what Etienne sent me
yesterday, with one correction: the grammar given yesterday does not
actually handle dollar signs in the input text, so you need to define a
separate token to match a dollar that's not part of a macro. (Is this
right? I think so, as it fails without this correction.)

Package com.a3sr.text.macro.engine;

Helpers
  ascii_character     = [0..0xff];
  ascii_small         = ['a'..'z'];
  ascii_caps          = ['A'..'Z'];

  digit               = ['0'..'9'];
  id_prefix           = ascii_small | ascii_caps | '_';
  id_char             = id_prefix | digit;
  id_component        = id_prefix id_char*;

  lf                  = 0x0a;
  cr                  = 0x0d;

  line_terminator     = lf | cr | cr lf;
  input_character     = [[ascii_character - [cr + lf]] - '$'];

States
  normal,
  var;

Tokens
  {normal->var} var_prefix = '$' '{' id_component ('.' id_component)* '}';
  {normal} string_literal = input_character+;
  {normal} dollar = '$';
  {var} start_id = '{';
  {var} identifier = id_component ('.' id_component)*;
  {var->normal} end_id = '}';

Productions
  file = fragment*;

  fragment = {text} string_literal
           | {macro} macro
           | {stray_dollar} dollar;

  macro = var_prefix start_id identifier end_id;


regards,
kd