[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Feature request

Hi Archie,

I've looked quickly at your suggestion. There are at least two ways of
achieving your requested functionality without modifying SableCC.

The idea is to first recognize that you need two different parsers.

The most common case (99,9% of SableCC users) is already handled by SableCC.
It consists in simply using two different packages (and grammar files), one
for each parser/AST. This is easy enough, so I won't elaborate more.

The other solution is for the unlikely case you where have some need for the
two parsers to share the same AST.

I'll use a minimal example to sketch the solution.

I assume that you want to have two parsers; one that parses a single int,
and another that parses a list of ints.

You need to build two grammar files; one for each parser. They are
identical, but for the first production in the "Production" section.

Package shared;

  number = ['0'..'9']+;
  blank = ' ';

Ignored Tokens

  grammar =
    {one}  integer |
    ( {many} integer* ); // Ignored in first grammar.

  integer = int;
  grammar =
    ( {one}  integer ) | // Ignored in second grammar.
    {many} integer*;

  integer = int;
--- END ---

Now, all you need to do is to launch SableCC on the first grammar, then
rename 'shared/parser/Parser.java' to 'shared/parser/IntParser.java'. Then
launch SableCC on the second grammar, then rename
'shared/parser/Parser.java' to 'shared/parser/IntListParser.java'. [You also
need to fix the class name in the two files.]

Then you are done! You can parse either grammars by simply writing:
  Start ast = new IntParser(...).parse();
  Start ast = new IntListParser(...).parse();

The trick, as you see, is to add a "grammar" production (in fact you could
call it "start"; there is no name conflict), and then use "ignored
alternatives" to make sure that the Switch classes contain the methods to
walk over both ASTs.

Now, you may want to see this second solution implemented in SableCC. I have
two arguments against it:
1- It is not a commonly needed functionality. The first solution is enough
in most cases.
2- There are two parsers to debug (e.g. fit into LALR(1) constraints). It
would be confusing to issue error messages on a mixed grammar.

I hope this helps.


-----Original Message-----
From: Archie Cobbs <archie@whistle.com>
To: sablecc-list@sable.mcgill.ca <sablecc-list@sable.mcgill.ca>
Date: Thursday, August 13, 1998 4:32 PM
Subject: Feature request

>Here's a feature request for the next version of SableCC.. implementing
>this is probably really easy, or else really hard... :-)
>Currently, one parses the input like this:
>  Parser p;
>  Start topNode;
>  p = new Parser(new Lexer(input));
>  topNode = p.parse();
>It would be nice if you could parse any arbitrary other node besides
>Start, by doing something like this:
>  Parser p;
>  PSomeArbitraryProduction node;
>  p = new Parser(new Lexer(input), PSomeArbitraryProduction.class);
>  node = p.parse();
>The idea being that you throw away all of the grammar except for what
>could possibly be in a tree with top node PSomeArbitraryProduction.
>So the parser would parse a PSomeArbitraryProduction, followed by EOF,
>and then finish.
>In my application, for example, stuff gets written to a file and needs
>to be read in again later. For example, you write out strings into a
>file like this:
>  string1="This is a \"doubly quoted\" string."
>You want to parse everything after the equals sign as a string
>token in a way that's consistent with the rest of the application.
>So you might send just that input to the parser, and get a PString
>node back.
>I'd be interested in what the right parser strategy for implementing
>this might be. Would you just create two new parser states for each node
>type? E.g.:
>   . node EOF
>   node . EOF
>These states would never be entered during a "complete" parse.
>Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com