[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: *Almost* free-form syntax



Ugly!  I'm pretty sure they use bison's default shift on shift/reduce conflicts to
deal with this.  It's an inherently ambiguous grammar.

OK.  This being said, this is how you deal with it.  The idea is to do as with the
well known if/if-else ambiguity, and duplicate parts of the grammar to express
the restrictions you want to impose (thus eliminating the ambiguity).

In the current case, the problem is that if you write:

f(x)(y), the parser does not know whether you are writing:

1- (f(x))(y) assuming f(x) returns a function
or
2- f(x); (y)  as lua allows for single expressions as statement.

Note: Do not forget that we do not have access to semantic information, so we
do not know the type of the return value(s) of f(x).

I am pretty sure (you might want to test) that lua assumes you want "1-" unless you
either add a semicolon or a newline (thus their blurb about newlines "avoiding"
ambiguities(!).


So, you want to select "1-". Consequence => a function call statement cannot be followed by a statement that starts with an "(".

Note:  Dealing with the newline thing requires playing with lexer states, and
is beyond the scope of this email.

How do we translate this into a context free grammar? Something like the following:


statements = non_functioncall* functioncall_and_statements?;

non_functioncall =
  normal_stat semicolon?;

normal_stat =
  varlist1 `=´ explist1 |
  do block end |
  while exp do block end |
  ...;  /* Do NOT include functioncall */

functioncall_and_statements =
   functioncall semicolon? nopar_statements;

nopar_statements =
  nopar_non_functioncall non_functioncall* functioncall_and_statements? |
  nopar_functioncall_and_statements?;

etc.

You must ensure that the nopar_* versions are identical except that they do
no allow for any leading "(".  Be careful not to disallow any other "(".

For example, it would have been wrong to write:

nopar_statements =
  nopar_non_functioncall* nopar_functioncall_and_statements?

as it would eliminate many valid constructs, e.g.

f(x) b (3 + 2) /* equivalent to f(x) ; b ; (3 + 2) */


Why do people so much dislike separators like ";" ? I find non-delimited code harder to read... Is it just me?

So, I hope this helps you.  Please contribute back your grammar, once it works,
for inclusion with the other grammars on the site (and in the subversion repository).

Have fun!

Etienne

Danilo Tuler wrote:
Hi,

I'm trying to write a parser for the Lua language (http://www.lua.org)
The problem is that the grammar is *almost* free-form, with one little
exception. The manual says:

"As an exception to the free-format syntax of Lua, you cannot put a line
break before the `(´ in a function call. That restriction avoids some
ambiguities in the language."

Here is a part of the grammar that is giving me headaches:

functioncall = prefix_expr args;
args = '(' expr_list? ')';
prefix_expr = '(' expr ')';

This is the ambiguity:
shift/reduce conflict in state [stack: PFunctionCall *] on TTokLparen in {
	[ P$Stat = PFunctionCall * ] followed by TTokLparen (reduce),
	[ PArgs = * TTokLparen PExprList TTokRparen ] (shift),
	[ PArgs = * TTokLparen TTokRparen ] (shift)
}

The linebreaks are thrown away by the lexer. How can I solve this??

Thanks,
Danilo







--
Etienne M. Gagnon, Ph.D.             http://www.info.uqam.ca/~egagnon/
SableVM:                                       http://www.sablevm.org/
SableCC:                                       http://www.sablecc.org/