[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tokens and free text
On Thu, 28 May 1998 08:42:30 -0400, you wrote:
>Bob.
>
>Many details...Why don't you create a system independent end-of-line? Why
>don't you use lexer states to detect the beginning of lines and get a better
>header recognition? This would allow you to narrow the definition of "text".
>
Lexer states are the way to go, I think. I realised this as I was waking
up this morning :-)
>The trick is to recognize the form "blank* 'to' blank* ':'" only at the
>beginning of a line. Inside the line (in normal state) you recognize only
>text chars.
>
>Helpers
> cr_lf = cr lf;
> blank = ' '; // you could add tabs,...
>
> colon = ':';
> cr = 0x000d;
> lf = 0x000a;
> char = [0x00..0xff];
>
>States
> bol, normal;
>
>Tokens
>
> {bol, normal->bol} eol = cr | lf | cr_lf;
>
>/* notice that helpers & tokens don't share the same name space */
Thanks for pointing this out (should have realised this when I couldn't
use helpers in the Productions).
> {bol->normal, normal} text_chars = char;
>
> {bol->normal} to_header = blank* 'to' blank* colon;
> {bol->normal} from_header = blank* 'from' blank* colon;
>
>/*******************************************************************
>* Productions *
>*******************************************************************/
>Productions
>
> message = lines*;
> lines = header text*;
> header = {to} to_header |
> {from} from_header;
>
> text = {text} text_chars |
> {eol} eol;
>
>Anyway, this is another way to look at the problem.
So how do these states work? When a token like 'to' is recognised only
in the bol state and when it is encountered a shift to 'normal' state
occurs?
If you remember my other post about case sensitivity you might notice a
slight complication. If there wasn't a separator (':' in my case)
between header and the rest, would states work? I still vote for "to" :)
Is this documented anywhere? I still think I must be missing some
documentation.
Thanks,
Bob
>
>Etienne
>
>
>-----Original Message-----
>From: Bob Hutchison <hutch@RedRock.com>
>To: sablecc-list@sable.mcgill.ca <sablecc-list@sable.mcgill.ca>
>Date: Thursday, May 28, 1998 2:11 AM
>Subject: Tokens and free text
>
>
>>Hi,
>>
>>Another question...
>>
>>Consider the following input:
>>
>>"to: i want this to work"
>>
>>The quotes are not part of the input.
>>
>>I've got this small grammar:
>>
>>
>>Package simple;
>>
>>/*******************************************************************
>> * Helpers *
>> *******************************************************************/
>>Helpers
>>
>> c_colon = ':';
>> c_cr = 0x000d;
>> c_lf = 0x000a;
>> char = [0x00..0xff];
>>
>>/*******************************************************************
>> * Tokens *
>> *******************************************************************/
>>Tokens
>>
>> colon = c_colon;
>> cr = c_cr;
>> lf = c_lf;
>> crlf = c_cr c_lf;
>> text_chars = char;
>>
>> to_header = 'to';
>> from_header = 'from';
>>
>>/*******************************************************************
>> * Productions *
>> *******************************************************************/
>>Productions
>>
>> message = lines*;
>> lines = header colon text* crlf;
>> header = {to} to_header
>> | {from} from_header
>> ;
>>
>> text = {text} text_chars
>> | {to} to_header
>> | {from} from_header
>> | {cr} cr
>> | {lf} lf
>> | {colon} colon
>> ;
>>
>>
>>The problem I have with this is the definition of the 'text' production.
>>It seems I have to list every token here to recognise the text of the
>>token.
>>
>>Now I really hope I'm missing something... or is this the way it is?
>>
>>thanks,
>>Bob
>>
>>
>>---
>>Bob Hutchison, hutch@RedRock.com, (416) 878-3454
>>RedRock, Toronto, Canada
>
---
Bob Hutchison, hutch@RedRock.com, (416) 878-3454
RedRock, Toronto, Canada