[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PushbackReader



>Could you tell us a bit about how the generated lexers use PushbackReaders?


The PushbackReader is used by the lexer to push back lookahead characters,
while scanning the input. Why? Because the lexer has to match the "longest"
possible string.

I do like to use Java standard APIs as much as possible. Knowing that I
needed a reader with "push back" capability, I thought that the
PushbackReader reader was exactly what I was looking for. It turns out that
the PushbackReader class uses a fixed size "pushback" buffer, and that this
size must be passed to the constructor.

Normally, you should not be concerned with this problem. I will fix this in
the next version of SableCC. I will then take a simple Reader as parameter
to the Lexer constructor. I'll also implement an internal variable size
pushback buffer.

----
[Politics break]
----
>I'm writing a grammar for HTML (which pending the boss's approval, I'll
>make freely available).  I'm testing out the lexer, and it seems to work

If it helps, you can tell your boss that you are getting "free" help from
the author of SableCC (that you use for "free" too). Your (and his)
contribution will allow the author to spend his time improving the SableCC
tool, instead of writing grammars.

It will also encourage others to do small contributions. On the long term,
this will pay back the day you will need a grammar for another language and
somebody will already have contributed one to the SableCC community.

----
[Back to technical stuff;-)]
----
>If the pushback reader's buffer is 3000 bytes, it works fine.  If I use a
>2000 byte buffer, however, the lexer will eventually crash with a
>PushbackReader overflow error when it does an "unread".  (See bottom of msg
>for stack trace) Strangely, however, the max token length is well under 800
>bytes.  And at the point where it does crash, it's not working with a
>particularly big token.


That's probably because of the current implementation of lexer states. All
states share the same DFA... In the beginning, I thought it was a good idea,
but it turns out that it's not. I've already fixed that in my personal
SableCC version. I'll release the fix in the next SableCC version (coming
very soon).

The short term solution is to simply give the PushbackReader a big enough
size (3K) :-)

Etienne