[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: use of states
Etienne,
Thanks for the help! For everyone else's benefit, I'm including the
working Lexer and grammar for a simple macro language that accepts inputs
like the following:
- logs [matches text fragment]
- logs$$ this should be fine $ okay? [matches a mix of text and "stray
dollar" fragment nodes]
- ${java.home} [matches macro]
- ${java.home}/${java.home}.log [matches a mix of macros and text
fragments]
Given this, it's very easy to build an elementary a macro processor.
As Etienne explained to me yesterday, the catch is that you need to build
a custom lexer to make up for the lack of lookahead in SableCC, so it can
tell the difference between "show me the $$$" (stray dollars) and
"show me the ${money}" (macro). Here's the Lexer:
package com.a3sr.text.macro;
import java.io.IOException;
import java.io.PushbackReader;
import com.a3sr.text.macro.engine.lexer.Lexer;
import com.a3sr.text.macro.engine.node.TVarPrefix;
/**
* Custom Lexer that handles the case where someone has "$" in the
* input text that's not part of a macro declaration.
*
* @author <a href="mailto:kdowney@amberarcher.com">Kyle F. Downey</a>
* @version $Revision: 1.1 $
*/
public class MacroLexer extends Lexer {
public MacroLexer(PushbackReader in) {
super(in);
}
protected void filter() {
if (token instanceof TVarPrefix) {
// pushback everything except the leading "$"
try {
unread(new TVarPrefix(token.getText().substring(1),
token.getLine(), token.getPos() +
1));
} catch (IOException e) {
e.printStackTrace();
}
token = (TVarPrefix)token.clone();
token.setText("$");
}
}
}
and here's the grammar. This is essentially what Etienne sent me
yesterday, with one correction: the grammar given yesterday does not
actually handle dollar signs in the input text, so you need to define a
separate token to match a dollar that's not part of a macro. (Is this
right? I think so, as it fails without this correction.)
Package com.a3sr.text.macro.engine;
Helpers
ascii_character = [0..0xff];
ascii_small = ['a'..'z'];
ascii_caps = ['A'..'Z'];
digit = ['0'..'9'];
id_prefix = ascii_small | ascii_caps | '_';
id_char = id_prefix | digit;
id_component = id_prefix id_char*;
lf = 0x0a;
cr = 0x0d;
line_terminator = lf | cr | cr lf;
input_character = [[ascii_character - [cr + lf]] - '$'];
States
normal,
var;
Tokens
{normal->var} var_prefix = '$' '{' id_component ('.' id_component)* '}';
{normal} string_literal = input_character+;
{normal} dollar = '$';
{var} start_id = '{';
{var} identifier = id_component ('.' id_component)*;
{var->normal} end_id = '}';
Productions
file = fragment*;
fragment = {text} string_literal
| {macro} macro
| {stray_dollar} dollar;
macro = var_prefix start_id identifier end_id;
regards,
kd