[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LaTeX from SableCC & auto-resolution of shift/reduce

Hi Pat,

Patrick LAM wrote:
I'm writing a toy language (for a POPL paper).  To avoid embarassing
errors, it would be nice if my toy language could actually have a parser
which recognizes the language.  So far, so good; I can write a SableCC
grammar for the language.

1. It would be great if I could generate a LaTeX grammar from the SableCC
grammar (design conformance for compilers!).  It doesn't seem like too
much work and I guess I could do this in a few hours.

I have an M.Sc. student working on a back-end scripting engine for SableCC. Once it is in place, it will be fairly easy to write a LaTeX target.

Meanwhile, you can probably get away with relatively small modifications to
Gen*.java classes and a related latex.txt template file.

2. Unfortunately, my language has shift-reduce conflicts.  I can manually
resolve them, but then if I would automatically generate LaTeX on the
resolved grammars, I'd get pretty nonintuitive grammars as a result.

Now, it appears to me that Etienne's inlining algorithm for resolving
shift-reduce conflicts could be automatically applied to a source SableCC
file to get a shift-reduce free SableCC file.  Does it seem to be
reasonable to implement a SableCC-to-SableCC compiler that could
automatically inline offending productions, or am I missing something

This is effectively a quite neat idea. "Automati Conflict Resolution" for simple conflicts sounds cool! I have already put a student to work on this.

When you think about it, you can see that automatic inlining can *sometimes*
be even more powerful than using K>1 lookahead [LR(K)]:

Grammar with conflict:

prod =
  {one} a T.id list T.one |
  {two} b T.id list T.two;

a = T.id;
b = T.id;

list =
  {single} T.id |
  {many}   list T.id;

In this case, there is obviously a reduce/reduce conflict on a/b after seing a T.id. No finite amount K of lookahead is sufficient to resolve this conflict as "list" can be arbitrarily long, and the parser needs to know whether the token following the list is a T.one or T.two to resolve the conflict.

On the other hand, inlining does resolve the conflict:


prod =
  {one} T.id T.id list T.one |
  {two} T.id T.id list T.two;

list =
  {single} T.id |
  {many}   list T.id;

In tnis case, the conflict is eliminated because the LR parser won't have to
make a decision on whether a prod.{one} or a prod.{two{ is being parsed before
seeing T.one or T.two. :-)

Kevin's CST->AST transformations can be extended to make such internal inlining
in case of conflict while hiding this fact from the programmer.

Thanks a lot, Patrick, for this neat idea.


Etienne M. Gagnon, Ph.D.             http://www.info.uqam.ca/~egagnon/
SableVM:                                       http://www.sablevm.org/
SableCC:                                       http://www.sablecc.org/