[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: shortcut



At 09:59 AM 10/29/98 -0500, you wrote:
>Why not do it in the lexer? It would simplify the grammar:

	Ooops.

	I guess that reflects an elementary misunderstanding I
	have, which is that there doesn't seem to be any difference
	between a parser and a lexer except that the lexer output is
	unresolved in the AST and only has a limited set of "tokens"
	(unicode) which cannot be read by methods.  Is there some 
	theoretical difference or is it a distinction made just for 
	speed?

	I wouldn't ask only, thinking about this (and possibly getting
	myself more confused) it would sometimes be nice to have
	a production that didn't create an extra node in the tree
	(ie appeared more like a token).  I'm thinking of cases where
	the productions does nothing more than list several more
	productions with no corresponding action in the final code.
	I'm sure that hidden alternatives could do the same thing, but
	if there were no distinction between lexer and parser then
	this would be possible directly.

	For example:

	statement = arithmetic_statement | function_call ;

	arithmetic_statement = term plus term | term minus term ;

	where, if arithmetic_statement didn't make an extra node in
	the AST you would have the equivalent of:

	statement = term plus term | term minus term | function_call ;

	in the final tree.

	This makes no difference in a compiler, but would increase
	the speed of an interpreter (which is what made me wonder -
	my interpreter spends a lot of time copying values up and
	down the tree).

	(I realise that a token doesn't have get methods for each
	character, but it seems to me that it could have).

	Sorry if this appears to be senseless rambling.  All I really
	want to know is if the distinction between parser and lexer
	is an a result of performance considerations alone, or if
	I have missed a more fundamental distinction.

	Andrew