[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lexer tokens -- and the order in which they are declared



Is it possible that the order in which lexer tokens are declared, affects
the lexing process?

Helpers
 digit = ['0' .. '9'];
 nondigit = ['_' + [['a' .. 'z'] + ['A' .. 'Z']]];

First case:

Tokens
 true='true';
 false='false';
 identifier = nondigit (digit | nondigit)*;

Seems to work differently from second case:

Tokens
 identifier = nondigit (digit | nondigit)*;
 true='true';
 false='false';

In the second case, the lexer never seems to return the tokens 'true' or
'false'. It looks like 'identifier' swallows them both. How can the lexer
DFA actually make a distinction between both cases? Is the algorithm
supposed to raise a token as soon as it reaches a match in the DFA, or is it
supposed to eat characters as long as it can, regardless of partial matches?
Does the order in which tokens are declared affect the algorithm?

Does anybody know how it is supposed to work?

Thanks
Erik Poupaert