[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SableCC 3-beta.2 Released

To: Jon Shapcott <eden@xibalba.demon.co.uk>
Subject: Re: SableCC 3-beta.2 Released
From: Etienne Gagnon <etienne.gagnon@uqam.ca>
Date: Sun, 14 Sep 2003 11:41:38 -0400
Cc: sablecc list <sablecc-list@sable.mcgill.ca>, Kevin Agbakpem <kevin@larc.info.uqam.ca>
In-reply-to: <20030910210442.GA10566@xibalba.demon.co.uk>
References: <3F593E8A.6000101@uqam.ca> <20030910210442.GA10566@xibalba.demon.co.uk>
Sender: owner-sablecc-list@sable.mcgill.ca
User-agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.4) Gecko/20030908 Debian/1.4-4

Hi Jon,

Jon Shapcott wrote:

This has left me somwhat mystified. I have yet to produce a conflict that
can be resolved by the inlining process. I notice that the doc directory has
a file called "rules-to-follow.txt", but it doesn't help much.

Here's a toy example:

p = a B C |
    A B D;

a = A;

This grammar has a shift/reduce conflict that will be eliminated by the
inclusion of "a"  (I'm using a very approximate syntax to illustrate the idea):

p = A B C -> new a(A) B C |
    A B D;

Using the CST->AST framework, transformatoins are added to the modified grammar
such as the final AST is the one expected by the programmer.

It worked just fine with the grammars I have, but since they don't have any
conflicts, that doesn't help dubug the inlining process.

Are there any example grammars with conflicts that can be resolved by the
inlining process? It would especially help if the results of the process is
plain from the output from the --prettyprint option.


A few months ago, Patrick Lam discussed of such a grammar/conflict for
a small language he designed.

http://www.sable.mcgill.ca/listarchives/sablecc-list/msg00968.html


This kind of conflict seems to be a recurrent one in the first drafts of grammars
that I (at least) write for small custom languages.  So, this mechanism is simply
there to help new users with simple-to-resolve conflicts.  It does *not* resolve
"precedence" and "dangling-else" type ambiguities and related conflicts. It is a
single step in adding well-specified conflict/ambiguity resolution constructs
to SableCC.

Kevin and I will be writing a paper about the inclusion technique. As soon as the
draft is ready, we'll make a Sable technical report with it and announce the
link to it on this list.

I notice that the filter() method has been removed from the generated
parser.

This was a difficulty we faced: in SableCC2, the filter() method was called
at well defined point in the grammar (from the point of view of the user).
In presence of CST->AST tranformations due to inlining, it becomes
difficult to provide the same expected behavior (e.g. filter() called at
the end of all alternatives [at reduction time]).  Even if we were to emulate
the deterministic sequence of filter() calls that would have resulted with
SableCC2, SableCC3 might have to proceed ahead with parsing well past the
filter() call point before the included-into alternative is reduced, so
it could invalidate some filter() common usages.

e.g.:

p = a B C |
    A B D;

a = A;

Given input ABC, one would expect the following event sequence:

shift A
reduce a & call filter()
shift B
shift C
reduce p (alternative #1) & call filter()

But, because of inclusion, the following is the best emulation we would be
able to do:

shift A
shift B
shift C
reduce p (alternative #1) & call filter() as in reduce a & call filter() as in reduce p

In order to provide the very convenient functionality provided by filter methods,
I plan adding explicit "semantic extractor" which would define specific locations
in the grammar where a specified method would be called, while preventing the
parser/lexer to proceed ahead of that point before the method is called.
This might disable inlining of some productions, but it would provide an intuitive
way of adding some semantic "actions" in the grammar.  Three added advantages over
SableCC2 filter() approach:
1- It provides a custom method name for each "semantic collector". (better soft. eng.)
2- It works in presence of CST->AST transformations.
3- It is explicitly visible in the grammar file (= good documentation) while remaining
   totally language independent (no action/semantic-extraction code in the grammar itself).

This unfortunately won't get into version 3.0.0. It is beyond Kevin's M.Sc. work.


Note:  If anyone has a better name than "semantic extractor" and "semantic choose"
(see below), please make your suggestions as soon as possible!  Thanks in advance. :-)

Explanation of terms: I intend to as 2 kind of parse-time "action" hooks:
1- semantic extractors: e.g. methods that collect typedef definitions, variable
   declarations, etc.  These are methods that "extract" some semantic
   information from the partially parsed AST (subtree).
2- semantic choosers: methods that use previously recorded semantic information
   (usually recorded by semantic extractors) to choose between various
   reductions.  e.g.: it could be used to choose between the following two
   reductions:
     varname = ID;
     typeneme = ID;

I am not entirely satisfied with "extractor" and "chooser", but I do not have a better
idea for now.

I spent a little while with the source putting it back so that it
behaved like the sablecc-3-beta.1 release candidate, which then allowed me
to test the parser generated by the sablecc-3-beta.2 release candidate with
my existing code. This took no account of the inlining process, but it
worked for grammars without any conflicts. The --no-inline option made no
difference for these grammars, for obvious reasons.


It could be an option to generate filter() methods when --no-inline used.
Could you provide a unified diff patch "diff -u" of your code, modified such
that filter() method are only generated when --no-inline is used?

Once the semantic extractors wille be in place, we should get rid of filter().

I have also found myself making quite a lot of mistakes when writing
CST->AST transformations. Elements inside curly and square brackets are
separated by whitespace, but elements inside round brackets are separated by
commas. I keep using whitespace and comma separation in the wrong places. Is
there any chance of making them all one thing or the other? Consistent comma
separation would probably allow for more growth in the CST->AST syntax than
consistent whitespace separation. The semicolon separator character should
probably be kept aside for side effecting transormations, which would solve
the few cases that the current CST->AST process can't handle.


I will let Kevin comment on this, and discuss whether he thinks this is a change
he is willing to do.

KEVIN:  Would it be possible that you address this problem on this list, not in
private messages.  Thanks.

Have fun!

Etienne

--
Etienne M. Gagnon, Ph.D.             http://www.info.uqam.ca/~egagnon/
SableVM:                                       http://www.sablevm.org/
SableCC:                                       http://www.sablecc.org/

References:
- SableCC 3-beta.2 Released
  - From: Etienne Gagnon <etienne.gagnon@uqam.ca>
- Re: SableCC 3-beta.2 Released
  - From: Jon Shapcott <eden@xibalba.demon.co.uk>

Prev by Date: Re: [Sablecc-user] Modify the grammar file
Next by Date: Re: [Sablecc-user] Modify the grammar file
Previous by thread: Re: SableCC 3-beta.2 Released
Next by thread: Re: [Sablecc-user] Modify the grammar file
Index(es):
- Date
- Thread