[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fw: HTML and shift/reduce conflicts
>After verifying the documentation that comes with SableCC, I agree that the
>licensing terms could be confusing. In the next release, I will add a
>specific notice at the top of all Public Domain files clearly stating that
>they can be used without any restriction.
Sounds good. Thanks!
>I've quickly looked at the HTML grammar. I have a couple of suggestions. I
>would try a different approach. I would define more specific opening tags.
>Here's the idea:
>
>Helpers
> h = ['h' + 'H']; // case insensitive 'h'
> t = ['t' + 'T'];
> m = ['m' + 'M'];
> l = ['l' + 'L'];
I was meaning to ask at some point -- is that the only way to do case
insensitive lexing?
>Tokens
> {normal->tag} b_html = '<' blank* h t m l; // e.g. '<html'
> {normal->tag} e_html = '<' blank* '/' blank* h t m l; // e.g.
>'</html'
> {tag->normal} end_tag = '>';
I reached a similar conclusion last night. What I *really* need is a
two-phase parser. Instead of the usual lex -> parse, I want lex -> parse1
-> parse2. Parse1 takes tokens like '<', 'html', and '>', and turns them
into "tokens" like '<html>'. I don't know how to implement this (at least
not without thinking), so I'm going to do the next best thing -- one
grammar that works the way you described (tokens like '<html>'), and a
second grammar to parse the internals of a tag. Run the first grammar over
the entire input file, and then as a tree-walker run the second grammar
once per tag (ie, about a million times per file). Not very elegant, but
it should get the job done.
>But the way, isn't there any YACC grammar for HTML?
Yes. One is availabe at
http://www.di.unito.it/mail_archive/YACCHTML/0000.html
It works approximately the way you suggest.
-Nick Kramer