[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: shortcut



Andrew Cooke writes:
> 	I guess that reflects an elementary misunderstanding I
> 	have, which is that there doesn't seem to be any difference
> 	between a parser and a lexer except that the lexer output is
> 	unresolved in the AST and only has a limited set of "tokens"
> 	(unicode) which cannot be read by methods.  Is there some 
> 	theoretical difference or is it a distinction made just for 
> 	speed?

Both a parser and a lexer can be thought of as black boxes that
either accept or reject their input. So you can define them in
terms of what kind of input they are capable of accepting .. ie,
distinguishing.

A lexer recognizes a sequence of input characters which can be
(consecutively) split up into tokens, each of which is described
by a regular expression (or, equivalently, finite state automata).
Regular expressions are things like.. "a(b|c)*", etc...

A parser accepts what's called a context-sensitive language..  that
is, a sequence of tokens that can be parsed into an abstract syntax
tree (which is NOT a linear thing, it's a tree) that is in agreement
with some context sensitive language definition: basically if you
can define each thing as some combinations of other things, some
of which are tokens, it's a CSL. Then there are some more subtle
restrictions, like being LALR(1), etc... (see the literature :-)

Etienne, is that about right? :-)

-Archie

___________________________________________________________________________
Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com