[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lexer tokens -- and the order in which they are declared

To: <sablecc-list@sable.mcgill.ca>
Subject: Lexer tokens -- and the order in which they are declared
From: "Erik Poupaert" <erik.poupaert@chello.be>
Date: Fri, 27 Dec 2002 10:03:25 +0100
Importance: Normal
In-reply-to: <3E06CF61.902@marni.otago.ac.nz>
Sender: owner-sablecc-list@sable.mcgill.ca

Is it possible that the order in which lexer tokens are declared, affects
the lexing process?

Helpers
 digit = ['0' .. '9'];
 nondigit = ['_' + [['a' .. 'z'] + ['A' .. 'Z']]];

First case:

Tokens
 true='true';
 false='false';
 identifier = nondigit (digit | nondigit)*;

Seems to work differently from second case:

Tokens
 identifier = nondigit (digit | nondigit)*;
 true='true';
 false='false';

In the second case, the lexer never seems to return the tokens 'true' or
'false'. It looks like 'identifier' swallows them both. How can the lexer
DFA actually make a distinction between both cases? Is the algorithm
supposed to raise a token as soon as it reaches a match in the DFA, or is it
supposed to eat characters as long as it can, regardless of partial matches?
Does the order in which tokens are declared affect the algorithm?

Does anybody know how it is supposed to work?

Thanks
Erik Poupaert

References:
- Re: compiling SableCc natively
  - From: Mariusz Nowostawski <mariusz@marni.otago.ac.nz>

Prev by Date: RE: compiling SableCc natively
Next by Date: comparison between native gcj and bytecode
Previous by thread: RE: compiling SableCc natively
Next by thread: FW: compiling SableCc natively
Index(es):
- Date
- Thread