[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Special tokens

To: SableCC mailing list <sablecc-list@sable.mcgill.ca>
Subject: Special tokens
From: Xuan Baldauf <xuan--sable.mcgill.ca@baldauf.org>
Date: Tue, 21 Aug 2001 23:50:05 +0200
Organization: Medium.net
Sender: owner-sablecc-list@sable.mcgill.ca

This is just for the record, I found the problem I describe here.

Hello,

I have a general definition for a token (like "token = tokenchar+;") and some special tokens (like the token "test" defined to be the string "test"). Trying to parse some text which contains a general token and a special token using the parser generated by SableCC2.17.3, I get such an exception:

com.mn.tools.test.bugs.sablecc.parser.ParserException: [1,5] expecting: 'test',got: 'test ' of type class com.mn.tools.test.bugs.sablecc.node.TToken
        at com.mn.tools.test.bugs.sablecc.parser.Parser.internalProcessToken(Parser.java:335)
        at com.mn.tools.test.bugs.sablecc.parser.Parser.processToken(Parser.java:224)
        at com.mn.tools.test.bugs.sablecc.parser.Parser.parse(Parser.java:263)
        at com.mn.tools.test.bugs.sablecc.ExpectingExistentToken.main(ExpectingExistentToken.java:20)

Files for this test case are attached.

It seems that the lexer or the parser (I'm not sure) interprets the word "test" as a general token instead of a special token. Is that the expected behaviour for that grammar specification? How can I avoid this and get the results which I expect (that if a token actually matches multiple token definitions, the most specific match is used)? Maybe it would be nice to print a warning at parser generation time that some tokens actually be matched because other token definitions match those tokens entirely.

Xuân.

P.S.: I changed the definition order in the "Tokens" section so that "token" is the last definition. Now I cannot reproduce the problem anymore. So definition order actually matters? Aaaahhh, I found it: The thesis page 34

For a given input, the longest matching token will be returned by the lexer. In the case of two matches of the same length, the token listed first in the specification file will be returned.

Package com.mn.tools.test.bugs.sablecc;

Helpers

	char									=	[0..127];
	sp										=	32;
	ht										=	9;

	separators						=	[ht+sp];

	tokenchar							= [char-separators];

Tokens

	token									=	tokenchar+;
	test									=	'test';
	space									=	(sp|ht)*;

Ignored Tokens

	space;

Productions

	grammar								=	my_token_list;

	my_token_list					=	token test;

/* ExpectingExistentToken.java
*/

package com.mn.tools.test.bugs.sablecc;

import java.io.*;

import com.mn.tools.test.bugs.sablecc.parser.*;
import com.mn.tools.test.bugs.sablecc.lexer.*;
import com.mn.tools.test.bugs.sablecc.node.*;
import com.mn.tools.test.bugs.sablecc.analysis.*;

public class ExpectingExistentToken {

	public static void main(String argv[]) {
		try {
//		Parser parser = new Parser(new Lexer(new PushbackReader(new StringReader(";q=0.66"))));
			Parser parser = new Parser(new Lexer(new PushbackReader(new StringReader("abc test"))));

			Start		start	=	parser.parse();

		} catch (Throwable e) {
			e.printStackTrace();
		}
	}
}

Prev by Date: Re: SableCC 3.x project
Next by Date: Re: Wishlist
Prev by thread: RE: Whishlist: Passive parsing
Next by thread: Use of identifier "class" as element name
Index(es):
- Date
- Thread