[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tokens and free text
Bob.
Many details...Why don't you create a system independent end-of-line? Why
don't you use lexer states to detect the beginning of lines and get a better
header recognition? This would allow you to narrow the definition of "text".
The trick is to recognize the form "blank* 'to' blank* ':'" only at the
beginning of a line. Inside the line (in normal state) you recognize only
text chars.
Helpers
cr_lf = cr lf;
blank = ' '; // you could add tabs,...
colon = ':';
cr = 0x000d;
lf = 0x000a;
char = [0x00..0xff];
States
bol, normal;
Tokens
{bol, normal->bol} eol = cr | lf | cr_lf;
/* notice that helpers & tokens don't share the same name space */
{bol->normal, normal} text_chars = char;
{bol->normal} to_header = blank* 'to' blank* colon;
{bol->normal} from_header = blank* 'from' blank* colon;
/*******************************************************************
* Productions *
*******************************************************************/
Productions
message = lines*;
lines = header text*;
header = {to} to_header |
{from} from_header;
text = {text} text_chars |
{eol} eol;
Anyway, this is another way to look at the problem.
Etienne
-----Original Message-----
From: Bob Hutchison <hutch@RedRock.com>
To: sablecc-list@sable.mcgill.ca <sablecc-list@sable.mcgill.ca>
Date: Thursday, May 28, 1998 2:11 AM
Subject: Tokens and free text
>Hi,
>
>Another question...
>
>Consider the following input:
>
>"to: i want this to work"
>
>The quotes are not part of the input.
>
>I've got this small grammar:
>
>
>Package simple;
>
>/*******************************************************************
> * Helpers *
> *******************************************************************/
>Helpers
>
> c_colon = ':';
> c_cr = 0x000d;
> c_lf = 0x000a;
> char = [0x00..0xff];
>
>/*******************************************************************
> * Tokens *
> *******************************************************************/
>Tokens
>
> colon = c_colon;
> cr = c_cr;
> lf = c_lf;
> crlf = c_cr c_lf;
> text_chars = char;
>
> to_header = 'to';
> from_header = 'from';
>
>/*******************************************************************
> * Productions *
> *******************************************************************/
>Productions
>
> message = lines*;
> lines = header colon text* crlf;
> header = {to} to_header
> | {from} from_header
> ;
>
> text = {text} text_chars
> | {to} to_header
> | {from} from_header
> | {cr} cr
> | {lf} lf
> | {colon} colon
> ;
>
>
>The problem I have with this is the definition of the 'text' production.
>It seems I have to list every token here to recognise the text of the
>token.
>
>Now I really hope I'm missing something... or is this the way it is?
>
>thanks,
>Bob
>
>
>---
>Bob Hutchison, hutch@RedRock.com, (416) 878-3454
>RedRock, Toronto, Canada