[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sablecc grammar (help!)

>Thanks for your advise.  However, I thought that the grammar file is a key
>to generating the AST.  If I use another compiler compiler for lexing the
>grammar, how can Sablecc then understand how to construct the AST (other
>compiler may have different specification on the grammar file than

Hi Jenny.

You need an LALR(1) grammar for HTML, if you want a SableCC AST. What I am
saying is that there are other tools out there that recognize LALR(1)
grammars [e.g.:YACC/Bison]. Such tools will try to generate LALR(1) parsing
tables for a grammar, and give error messages (or warnings!!!) if a grammar
is not LALR(1). These tools are faster than SableCC because they are written
in C and they are not interpreted by a slow virtual machine. Therefore, if
there is any non-LALR(1) construct in your grammar, these tools will give
you the error message in less time than it would take SableCC to tell you
the same thing. So, for speed of development reasons, they might be more
appropriate for the LALR(1) transformation phase.

Once you have obtained an LALR(1) grammar, then you simply feed this same
grammar (syntactically modified to match SableCC's syntax) to SableCC, and
you get the parser/AST. In summary: you simply use Bison for getting a pure
LALR(1) grammar, not for constructing the parser.

The other point is that there exists many free LALR(1) grammars on the
Internet for existing tools [Yacc/Bison]. If you can avoid the somewhat
difficult task of transforming a complex grammar to abide by the LALR(1)
constraints, then you would save many long hours of work. All you need to
make sure that you change the grammar syntax to SableCC notation.

The whole idea is to get your work done as efficiently as possible. SableCC
is very appropriate once you have an LALR(1) grammar, like the provided Java
grammars. It is still a good choice if your grammar is relatively small. But
the current version is less appropriate for transforming a complex grammar
to LALR(1), according to what some people have shared with us on the