[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Wish List: Prefix List for Token Definitions



I recently ran into a problem. I have the same token '*' representing two
very different concepts. The situation is similar to the C problem of
deciding whether the '*' is for multiplication or for pointer in type
casting. However, to make it even worse, something like the following is
actually valid:

* * *
It means that a regular-expression kleene star multiplies another kleene
star. The grammar is a bit too big to unwind so what I did was to define

kleene_tok = '*';
star_tok = '*';

But since they are exactly the same, I may get the wrong token in my
grammar. What I did was to subclass Lexer:

public NewLexer extends Lexer

Since the kleene_tok must follow either the beginning of the file, '[', '(',
'/', ':', '@', etc., basically all the operators, including '*' when it is a
multiplication operator, in the grammar, I override the next() and peek()
methods to check for the prefixes. In this way, I get my kleene_tok and
star_tok returned in the proper place.

Okay, after this long explanation, my questions are as follow. As far as I
can tell, it is actually not difficult to allow the programmer to specify
prefixes for tokens (a bit like the '/' lookahead feature which is not
implemented yet (?)). This would have made my life much easier for a week,
trying to unwind, twist and turn the grammar to remove shift/reduce and
reduce/reduce conflicts like crazy before I gave up and subclassed the Lexer
class.

1.	Is there a general method of avoiding what I have done?
2.	Is the prefix idea useful in other situations?

I can include a simplified grammar to illustrate the situation if it is
necessary. The grammar is the XPath language for XSLT and XQL from the W3C,
in case you wonder.

Khun Yee