[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: shortcut
At 09:59 AM 10/29/98 -0500, you wrote:
>Why not do it in the lexer? It would simplify the grammar:
Ooops.
I guess that reflects an elementary misunderstanding I
have, which is that there doesn't seem to be any difference
between a parser and a lexer except that the lexer output is
unresolved in the AST and only has a limited set of "tokens"
(unicode) which cannot be read by methods. Is there some
theoretical difference or is it a distinction made just for
speed?
I wouldn't ask only, thinking about this (and possibly getting
myself more confused) it would sometimes be nice to have
a production that didn't create an extra node in the tree
(ie appeared more like a token). I'm thinking of cases where
the productions does nothing more than list several more
productions with no corresponding action in the final code.
I'm sure that hidden alternatives could do the same thing, but
if there were no distinction between lexer and parser then
this would be possible directly.
For example:
statement = arithmetic_statement | function_call ;
arithmetic_statement = term plus term | term minus term ;
where, if arithmetic_statement didn't make an extra node in
the AST you would have the equivalent of:
statement = term plus term | term minus term | function_call ;
in the final tree.
This makes no difference in a compiler, but would increase
the speed of an interpreter (which is what made me wonder -
my interpreter spends a lot of time copying values up and
down the tree).
(I realise that a token doesn't have get methods for each
character, but it seems to me that it could have).
Sorry if this appears to be senseless rambling. All I really
want to know is if the distinction between parser and lexer
is an a result of performance considerations alone, or if
I have missed a more fundamental distinction.
Andrew