[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
HTML and shift/reduce conflicts
Good news, bad news, folks. The good news is that I have permission to
distribute the HTML grammar I'm working on. Note that I'm putting it in
the public domain, and am *not* using the GNU Public License (GPL). This
means you are legally allowed to do anything you damned well please with
the grammar (except claim you wrote it), including using it in a product
whose source code you won't give out, and making improvements to the
grammar that you don't want to share with anyone else.
[Politics break]
I'd like to encourage Etienne to change the SableCC license from GPL to
LGPL ("library GPL"). I think it'd be a good all around change, but it is
particularly important for the sample grammars. Under GPL, if I were to
use the sample Java grammar in my product, I'd have to GPL my entire
product. There's no chance I'm going to do that, so I can't use the
grammar at all. But under LGPL, if I use that grammar, all I have to ship
is my code in library form, so that it can be linked with the Java grammar.
And that's something I can live with.
[End politics]
I also think I have the full lexical specification working correctly,
although I haven't tested this very hard.
But the bad news is, I've got a ton of shift-reduce conflicts which I can't
figure out how to resolve. (Hopefully that's because I've been hacking
JavaCC too long, and not because they are unresolvable in LALR(1)...)
Here's the error msg you'll get when running SableCC on the grammar
(provided below):
shift/reduce conflict on TOpenAngle in {
[PHtml = PHead PBody]:0:EOF,
[PHtml = TOpenAngle THtml TCloseAngle PHead PBody TAngleSlash THtml
TCloseAngle]:0:EOF,
[PHtml = TOpenAngle THtml PAttributeList TCloseAngle PHead PBody
TAngleSlash THtml TCloseAngle]:0:EOF,
[PHead = PHeadContents]:0:TOpenAngle,
[PHead = PHeadContents]:0:TPcdata,
[PHead = PHeadContents]:0:EOF,
[PHead = TOpenAngle THead TCloseAngle PHeadContents TAngleSlash
THead TCloseAngle]:0:TOpenAngle,
[PHead = TOpenAngle THead TCloseAngle PHeadContents TAngleSlash
THead TCloseAngle]:0:TPcdata,
[PHead = TOpenAngle THead TCloseAngle PHeadContents TAngleSlash
THead TCloseAngle]:0:EOF,
[PHead = TOpenAngle THead PAttributeList TCloseAngle PHeadContents
TAngleSlash THead TCloseAngle]:0:TOpenAngle,
[PHead = TOpenAngle THead PAttributeList TCloseAngle PHeadContents
TAngleSlash THead TCloseAngle]:0:TPcdata,
[PHead = TOpenAngle THead PAttributeList TCloseAngle PHeadContents
TAngleSlash THead TCloseAngle]:0:EOF,
[PHeadContents = PEheadMisc PTitle PEheadMisc PEheadMisc
PEheadMisc]:0:TOpenAngle,
[PHeadContents = PEheadMisc PTitle PEheadMisc PEheadMisc
PEheadMisc]:0:TPcdata,
[PHeadContents = PEheadMisc PTitle PEheadMisc PEheadMisc
PEheadMisc]:0:EOF,
[PHeadContents = PEheadMisc PTitle PEheadMisc PIsindex PEheadMisc
PEheadMisc]:0:TOpenAngle,
[PHeadContents = PEheadMisc PTitle PEheadMisc PIsindex PEheadMisc
PEheadMisc]:0:TPcdata,
[PHeadContents = PEheadMisc PTitle PEheadMisc PIsindex PEheadMisc
PEheadMisc]:0:EOF,
[PHeadContents = PEheadMisc PTitle PEheadMisc PEheadMisc PBase
PEheadMisc]:0:TOpenAngle,
[PHeadContents = PEheadMisc PTitle PEheadMisc PEheadMisc PBase
PEheadMisc]:0:TPcdata,
[PHeadContents = PEheadMisc PTitle PEheadMisc PEheadMisc PBase
PEheadMisc]:0:EOF,
[PHeadContents = PEheadMisc PTitle PEheadMisc PIsindex PEheadMisc
PBase
PEheadMisc]:0:TOpenAngle,
[PHeadContents = PEheadMisc PTitle PEheadMisc PIsindex PEheadMisc
PBase
PEheadMisc]:0:TPcdata,
[PHeadContents = PEheadMisc PTitle PEheadMisc PIsindex PEheadMisc
PBase
PEheadMisc]:0:EOF,
[PEheadMisc = ]:0:TOpenAngle,
[PEheadMisc = XPEheadMiscPart]:0:TOpenAngle,
[XPEheadMiscPart = XPEheadMiscPart PEheadMiscPart]:0:TOpenAngle,
[XPEheadMiscPart = PEheadMiscPart]:0:TOpenAngle,
[PEheadMiscPart = PScript]:0:TOpenAngle,
[PEheadMiscPart = PStyle]:0:TOpenAngle,
[PEheadMiscPart = PMeta]:0:TOpenAngle,
[PEheadMiscPart = PLink]:0:TOpenAngle,
[PStyle = TOpenAngle TStyle TCloseAngle TPcdata TAngleSlash TStyle
TCloseAngle]:0:TOpenAngle,
[PStyle = TOpenAngle TStyle PAttributeList TCloseAngle TPcdata
TAngleSlash TStyle TCloseAngle]:0:TOpenAngle,
[PMeta = TOpenAngle TMeta TCloseAngle]:0:TOpenAngle,
[PMeta = TOpenAngle TMeta PAttributeList TCloseAngle]:0:TOpenAngle,
[PLink = TOpenAngle TLink TCloseAngle]:0:TOpenAngle,
[PLink = TOpenAngle TLink PAttributeList TCloseAngle]:0:TOpenAngle,
[PScript = TOpenAngle TScript TCloseAngle TPcdata TAngleSlash
TScript TCloseAngle]:0:TOpenAngle,
[PScript = TOpenAngle TScript PAttributeList TCloseAngle TPcdata
TAngleSlash TScript TCloseAngle]:0:TOpenAngle,
[Start = PHtml]:0:EOF
}
It takes about six and a half minutes for SableCC to detect this error.
This makes it seriously painful to develop new grammars (as opposed to
reusing existing grammars, which is presumably easy).
So, does anyone know how to fix this?
Anyhow, here's the grammar:
// A grammar for HTML version 3.2.
//
// Copyright 1998 Justsystem Pittsburgh Research Center. Donated to
// the public domain.
//
// Written by Nick Kramer (nkramer@jprc.com). Send comments and bug
// reports to him.
Package html;
Helpers
any_char = [0..255];
alpha = ['a'..'z']
| ['A'..'Z']
| '_'
| '-'
| '.';
num = ['0'..'9'];
alphanum = alpha | num;
tab = 0x0009; // tab
lf = 0x000a; // line feed
ff = 0x000c; // form feed
cr = 0x000d; // cariage return
space = ' '; // space
line_terminator = lf | cr | cr lf;
not_single_quote = [any_char - '''];
not_double_quote = [any_char - '"'];
// We want to exclude tab, lf, ff, cr, space, and '>'. That's characters
// 9, 12, 13, 10, 32, and 62, respectively. Sadly, there doesn't seem
// to be any more direct way of doing this.
cdata_helper = [0..8]
| 11
| [14..31]
| [33..61]
| [63..255];
single_quote_cdata = ''' not_single_quote* ''';
double_quote_cdata = '"' not_double_quote* '"';
unquoted_cdata = cdata_helper+;
not_open_angle = [any_char - '<'];
not_close_angle = [any_char - '>'];
States
// States are named after the kinds of tokens they expect to see.
// For instance, when you're in "attribute_name" mode, you expect the
// next thing you read to be an attribute's name.
normal, // initial state -- outside <...>
tag, // tag name -- inside <...>
attribute_name, // inside <...>
attribute_value, // inside <...>
attribute_comment; // inside <...>
Tokens
{ normal } comment_tag = '<!' not_close_angle '>';
{ tag } tag_whitespace = space | tab | cr | lf;
{ normal -> tag } open_angle = '<';
{ normal -> tag } angle_slash = '</';
{ attribute_name -> normal }
close_angle = '>';
{ normal } pcdata = not_open_angle*;
// Tag names
{ tag -> attribute_name } a = 'a';
{ tag -> attribute_name } address = 'address';
{ tag -> attribute_name } applet = 'applet';
{ tag -> attribute_name } area = 'area';
{ tag -> attribute_name } b = 'b';
{ tag -> attribute_name } base = 'base';
{ tag -> attribute_name } basefont = 'basefont';
{ tag -> attribute_name } big = 'big';
{ tag -> attribute_name } blockquote = 'blockquote';
{ tag -> attribute_name } body = 'body';
{ tag -> attribute_name } br = 'br';
{ tag -> attribute_name } caption = 'caption';
{ tag -> attribute_name } center = 'center';
{ tag -> attribute_name } cite = 'cite';
{ tag -> attribute_name } code = 'code';
{ tag -> attribute_name } dd = 'dd';
{ tag -> attribute_name } dfn = 'dfn';
{ tag -> attribute_name } dir = 'dir';
{ tag -> attribute_name } div = 'div';
{ tag -> attribute_name } dl = 'dl';
{ tag -> attribute_name } dt = 'dt';
{ tag -> attribute_name } em = 'em';
{ tag -> attribute_name } font = 'font';
{ tag -> attribute_name } form = 'form';
{ tag -> attribute_name } h1 = 'h1';
{ tag -> attribute_name } h2 = 'h2';
{ tag -> attribute_name } h3 = 'h3';
{ tag -> attribute_name } h4 = 'h4';
{ tag -> attribute_name } h5 = 'h5';
{ tag -> attribute_name } h6 = 'h6';
{ tag -> attribute_name } head = 'head';
{ tag -> attribute_name } hr = 'hr';
{ tag -> attribute_name } html = 'html';
{ tag -> attribute_name } i = 'i';
{ tag -> attribute_name } img = 'img';
{ tag -> attribute_name } input = 'input';
{ tag -> attribute_name } isindex = 'isindex';
{ tag -> attribute_name } kbd = 'kbd';
{ tag -> attribute_name } li = 'li';
{ tag -> attribute_name } link = 'link';
{ tag -> attribute_name } map = 'map';
{ tag -> attribute_name } menu = 'menu';
{ tag -> attribute_name } meta = 'meta';
{ tag -> attribute_name } ol = 'ol';
{ tag -> attribute_name } option = 'option';
{ tag -> attribute_name } para = 'p';
{ tag -> attribute_name } param = 'param';
{ tag -> attribute_name } pre = 'pre';
{ tag -> attribute_name } prompt = 'prompt';
{ tag -> attribute_name } samp = 'samp';
{ tag -> attribute_name } script = 'script';
{ tag -> attribute_name } select = 'select';
{ tag -> attribute_name } small = 'small';
{ tag -> attribute_name } strike = 'strike';
{ tag -> attribute_name } strong = 'strong';
{ tag -> attribute_name } style = 'style';
{ tag -> attribute_name } sub = 'sub';
{ tag -> attribute_name } sup = 'sup';
{ tag -> attribute_name } table = 'table';
{ tag -> attribute_name } td = 'td';
{ tag -> attribute_name } textarea = 'textarea';
{ tag -> attribute_name } th = 'th';
{ tag -> attribute_name } title = 'title';
{ tag -> attribute_name } tr = 'tr';
{ tag -> attribute_name } tt = 'tt';
{ tag -> attribute_name } u = 'u';
{ tag -> attribute_name } ul = 'ul';
{ tag -> attribute_name } var = 'var';
{ attribute_name } attrlist_whitespace = space | tab | cr |
lf;
{ attribute_name -> attribute_comment }
attrlist_comment_start = '--';
{ attribute_name -> attribute_value }
attribute_equals = '=';
{ attribute_name } attribute_name = alpha alphanum*;
{ attribute_value -> attribute_name }
cdata = single_quote_cdata
| double_quote_cdata
| unquoted_cdata;
{ attribute_comment } attrlist_no_dash = [any_char - '-']*;
{ attribute_comment } attrlist_one_dash = '-' [any_char - '-']*;
{ attribute_comment -> attribute_name }
attrlist_comment_end = '--';
Ignored Tokens
attrlist_whitespace, attrlist_comment_start,
attrlist_no_dash, attrlist_one_dash, attrlist_comment_end;
Productions
// main entry point to grammar
html =
{implicit} P.head P.body |
{explicit} open_angle T.html attribute_list? close_angle
P.head P.body
angle_slash [ignore]:T.html [close_close_angle]:close_angle;
// File header
head =
{implicit} head_contents |
{explicit} open_angle T.head attribute_list? close_angle
head_contents
angle_slash [ignore]:T.head [close_close_angle]:close_angle;
head_contents = [first_misc]:ehead_misc
P.title
[second_misc]:ehead_misc
P.isindex?
[third_misc]:ehead_misc
P.base?
[fourth_misc]:ehead_misc;
title = open_angle T.title attribute_list? close_angle
pcdata
angle_slash [ignore]:T.title [close_close_angle]:close_angle;
isindex = open_angle T.isindex attribute_list? close_angle;
base = open_angle T.base attribute_list? close_angle;
ehead_misc = ehead_misc_part*;
ehead_misc_part =
{script} P.script |
{style} P.style |
{meta} P.meta |
{link} P.link;
style = open_angle T.style attribute_list? close_angle
pcdata
angle_slash [ignore]:T.style [close_close_angle]:close_angle;
meta = open_angle T.meta attribute_list? close_angle;
link = open_angle T.link attribute_list? close_angle;
// Main part of the file ("body")
body =
{implicit} ebody_content |
{explicit} open_angle T.body attribute_list? close_angle
ebody_content
angle_slash [ignore]:T.body [close_close_angle]:close_angle;
ebody_content = ebody_content_part*;
ebody_content_part =
{eheading} eheading |
{etext} etext |
{eblock} eblock |
{address} P.address;
address =
open_angle T.address attribute_list? close_angle
etext_or_para*
angle_slash [ignore]:T.address [close_close_angle]:close_angle;
etext =
{pcdata} pcdata |
{efont} P.efont |
{ephrase} P.ephrase |
{especial} P.especial |
{eform} P.eform;
etext_or_para =
{etext} etext |
{para} P.para;
// Headers
eheading =
{h1} P.h1 |
{h2} P.h2 |
{h3} P.h3 |
{h4} P.h4 |
{h5} P.h5 |
{h6} P.h6;
h1 = open_angle T.h1 attribute_list? close_angle
etext*
angle_slash [ignore]:T.h1 [close_close_angle]:close_angle;
h2 = open_angle T.h2 attribute_list? close_angle
etext*
angle_slash [ignore]:T.h2 [close_close_angle]:close_angle;
h3 = open_angle T.h3 attribute_list? close_angle
etext*
angle_slash [ignore]:T.h3 [close_close_angle]:close_angle;
h4 = open_angle T.h4 attribute_list? close_angle
etext*
angle_slash [ignore]:T.h4 [close_close_angle]:close_angle;
h5 = open_angle T.h5 attribute_list? close_angle
etext*
angle_slash [ignore]:T.h5 [close_close_angle]:close_angle;
h6 = open_angle T.h6 attribute_list? close_angle
etext*
angle_slash [ignore]:T.h6 [close_close_angle]:close_angle;
// Block tags
eblock =
{para} P.para |
{elist} P.elist |
{pre} P.pre |
{dl} P.dl |
{div} P.div |
{center} P.center |
{blockquote} P.blockquote |
{form} P.form |
{isindex} P.isindex |
{hr} P.hr |
{table} P.table;
para =
open_angle T.para attribute_list? close_angle
etext*
close_para?;
close_para = angle_slash [ignore]:T.para [close_close_angle]:close_angle;
pre =
open_angle T.pre attribute_list? close_angle
etext*
angle_slash [ignore]:T.pre [close_close_angle]:close_angle;
div =
open_angle T.div attribute_list? close_angle
ebody_content
angle_slash [ignore]:T.div [close_close_angle]:close_angle;
center =
open_angle T.center attribute_list? close_angle
ebody_content
angle_slash [ignore]:T.center [close_close_angle]:close_angle;
blockquote =
open_angle T.blockquote attribute_list? close_angle
ebody_content
angle_slash [ignore]:T.blockquote [close_close_angle]:close_angle;
hr = open_angle T.hr attribute_list? close_angle;
// Lists (not including definition lists)
elist = {ul} P.ul |
{ol} P.ol |
{dir} P.dir |
{menu} P.menu;
ol =
open_angle T.ol attribute_list? close_angle
P.li*
angle_slash [ignore]:T.ol [close_close_angle]:close_angle;
ul =
open_angle T.ul attribute_list? close_angle
P.li*
angle_slash [ignore]:T.ul [close_close_angle]:close_angle;
dir =
open_angle T.dir attribute_list? close_angle
P.li*
angle_slash [ignore]:T.dir [close_close_angle]:close_angle;
menu =
open_angle T.menu attribute_list? close_angle
P.li*
angle_slash [ignore]:T.menu [close_close_angle]:close_angle;
li =
open_angle T.li attribute_list? close_angle
eflow
close_li?;
close_li = angle_slash [ignore]:T.li [close_close_angle]:close_angle;
// Definition lists
dl =
open_angle T.dl attribute_list? close_angle
dl_contents*
angle_slash [ignore]:T.dl [close_close_angle]:close_angle;
dl_contents =
{dt} P.dt |
{dd} P.dd;
dt =
open_angle T.dt attribute_list? close_angle
etext*
angle_slash [ignore]:T.dt [close_close_angle]:close_angle;
dd =
open_angle T.dd attribute_list? close_angle
eflow
angle_slash [ignore]:T.dd [close_close_angle]:close_angle;
// Built-in Fonts
efont = {tt} P.tt |
{i} P.i |
{b} P.b |
{u} P.u |
{strike} P.strike |
{big} P.big |
{small} P.small |
{sub} P.sub |
{sup} P.sup;
tt = open_angle T.tt attribute_list? close_angle
etext*
angle_slash [ignore]:T.tt [close_close_angle]:close_angle;
i = open_angle T.i attribute_list? close_angle
etext*
angle_slash [ignore]:T.i [close_close_angle]:close_angle;
b = open_angle T.b attribute_list? close_angle
etext*
angle_slash [ignore]:T.b [close_close_angle]:close_angle;
u = open_angle T.u attribute_list? close_angle
etext*
angle_slash [ignore]:T.u [close_close_angle]:close_angle;
strike = open_angle T.strike attribute_list? close_angle
etext*
angle_slash [ignore]:T.strike [close_close_angle]:close_angle;
big = open_angle T.big attribute_list? close_angle
etext*
angle_slash [ignore]:T.big [close_close_angle]:close_angle;
small = open_angle T.small attribute_list? close_angle
etext*
angle_slash [ignore]:T.small [close_close_angle]:close_angle;
sup = open_angle T.sup attribute_list? close_angle
etext*
angle_slash [ignore]:T.sup [close_close_angle]:close_angle;
sub = open_angle T.sub attribute_list? close_angle
etext*
angle_slash [ignore]:T.sub [close_close_angle]:close_angle;
// Abstract fonts
ephrase =
{em} P.em |
{strong} P.strong |
{dfn} P.dfn |
{code} P.code |
{samp} P.samp |
{kbd} P.kbd |
{var} P.var |
{cite} P.cite;
em = open_angle T.em attribute_list? close_angle
etext*
angle_slash [ignore]:T.em [close_close_angle]:close_angle;
strong = open_angle T.strong attribute_list? close_angle
etext*
angle_slash [ignore]:T.strong [close_close_angle]:close_angle;
dfn = open_angle T.dfn attribute_list? close_angle
etext*
angle_slash [ignore]:T.dfn [close_close_angle]:close_angle;
code = open_angle T.code attribute_list? close_angle
etext*
angle_slash [ignore]:T.code [close_close_angle]:close_angle;
samp = open_angle T.samp attribute_list? close_angle
etext*
angle_slash [ignore]:T.samp [close_close_angle]:close_angle;
kbd = open_angle T.kbd attribute_list? close_angle
etext*
angle_slash [ignore]:T.kbd [close_close_angle]:close_angle;
var = open_angle T.var attribute_list? close_angle
etext*
angle_slash [ignore]:T.var [close_close_angle]:close_angle;
cite = open_angle T.cite attribute_list? close_angle
etext*
angle_slash [ignore]:T.cite [close_close_angle]:close_angle;
// "Special" tags
especial =
{a} P.a |
{img} P.img |
{applet} P.applet |
{font} P.font |
{basefont} P.basefont |
{br} P.br |
{script} P.script |
{map} P.map;
a = open_angle T.a attribute_list? close_angle
etext*
angle_slash [ignore]:T.a [close_close_angle]:close_angle;
img = open_angle T.img attribute_list? close_angle;
font = open_angle T.font attribute_list? close_angle
etext*
angle_slash [ignore]:T.font [close_close_angle]:close_angle;
basefont = open_angle T.basefont attribute_list? close_angle;
br = open_angle T.br attribute_list? close_angle;
script = open_angle T.script attribute_list? close_angle
pcdata
angle_slash [ignore]:T.script [close_close_angle]:close_angle;
// Forms
form =
open_angle T.form attribute_list? close_angle
ebody_content
angle_slash [ignore]:T.form [close_close_angle]:close_angle;
// One of the things an ebody_content can contain is an eform
eform =
{input} P.input |
{select} P.select |
{textarea} P.textarea;
input = open_angle T.input attribute_list? close_angle;
select =
open_angle T.select attribute_list? close_angle
P.option+
angle_slash [ignore]:T.select [close_close_angle]:close_angle;
option =
open_angle T.option attribute_list? close_angle
pcdata*
angle_slash [ignore]:T.option [close_close_angle]:close_angle;
textarea =
open_angle T.textarea attribute_list? close_angle
pcdata*
angle_slash [ignore]:T.textarea [close_close_angle]:close_angle;
// Tables
table =
open_angle T.table attribute_list? close_angle
P.caption? P.tr+
angle_slash [ignore]:T.table [close_close_angle]:close_angle;
tr =
open_angle T.tr attribute_list? close_angle
th_or_td*
angle_slash [ignore]:T.tr [close_close_angle]:close_angle;
caption =
open_angle T.caption attribute_list? close_angle
etext*
angle_slash [ignore]:T.caption [close_close_angle]:close_angle;
th_or_td =
{th} P.th |
{td} P.td;
th =
open_angle T.th attribute_list? close_angle
ebody_content
angle_slash [ignore]:T.th [close_close_angle]:close_angle;
td =
open_angle T.td attribute_list? close_angle
ebody_content
angle_slash [ignore]:T.td [close_close_angle]:close_angle;
// Maps
map =
open_angle T.map attribute_list? close_angle
P.area*
angle_slash [ignore]:T.map [close_close_angle]:close_angle;
area =
open_angle T.area attribute_list? close_angle;
// Applets
applet =
open_angle T.applet attribute_list? close_angle
param_or_etext*
angle_slash [ignore]:T.applet [close_close_angle]:close_angle;
param_or_etext =
{param} P.param |
{text} etext;
param = open_angle T.param attribute_list? close_angle;
// Misc. Utilities
attribute_list = attribute+;
attribute =
{name_only} attribute_name |
{valued} attribute_name attribute_equals cdata;
eflow = eflow_contents*;
eflow_contents =
{etext} etext |
{eblock} eblock;
-Nick Kramer