[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lexer filters.



I've previously annoyed Etienne personally by suggesting that stacked
lexer states could be supported with the notations {foo->inner->outer}
(transition to "inner" with plans for "outer") and {inner->} (return to
new context - in this case outer).

He rightly chastised me for not reading the documentation well enough
and pointing out that I was perfectly welcome to write my own lexer
filter. (Doh. I must have been asleep while I was reading.)

HOWEVER, while my problem is now in effect solved, I'm still faintly
grumpy: I now need a comment in my grammar file saying "this looks like
it doesn't work but in fact there's a lexer filter elsewhere that
provides the missing transitions into and out of comment state".

In short, I've lost the promised property that the grammar is segregated
in the grammar file!

Um, methinks it would be WAAAAAY more modular and self-documenting and
motherhood and apple pie and gotos considered harmful, sort of thing, if
this effect was at least triggered from/documented in the grammar
itself: for my special case, saying {normal->comment->normal} and
{comment->} would have sufficed, as I said, but I grant it isn't very
general!

Better ideas:

(1) Something like {normal->commentStateMachine()}, my notation is
horrible I know, meaning "when in state NORMAL, and considering this
token, use Lexer.filterCommentStateMachine in place of Lexer.filter"
would at least make the tricky parts stand out, and might be more
efficient than using a user filter all the time.

(2) Something like [Class]: in front of a token that makes Class (a
subclass of Token) be the immediate superclass of that token; then make
filter be a method of *Token* rather than Lexer.

(3) Something like {normal:commentRule} in which the string(? ick)
"commentRule" is passed to the existing filter mechanism as an extra
argument.

(4) Something totally slick that I haven't quite thought of yet, that
has the property that the current {foo->bar} is a special case :-).

(5) Failing ALL of the above, it would still be an improvement, both in
potential performance and in style, if user filters required an explicit
trigger in the grammar, maybe something as simple as {normal*}.

Each of these can be made smoothly back compatible with a suitable
default implementation, I think.

regards all
stephen
having a lot of fun!