[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

PushbackReader



Etienne -

Could you tell us a bit about how the generated lexers use PushbackReaders?
 I'm writing a grammar for HTML (which pending the boss's approval, I'll
make freely available).  I'm testing out the lexer, and it seems to work
fine, but only if I use a really large buffer for the pushback reader.  The
routine I use to test the lexer looks like this:

  public static void main (String[] args) {
    int maxLen = 0;
    try {
      FileReader stream = new FileReader(args[0]);
      PushbackReader reader
	= new PushbackReader(new BufferedReader(stream), 2000);
      Lexer lexer = new Lexer(reader);
      while (true) {
	Token t = lexer.next();
	String text = t.getText();
	System.out.println(text);
	maxLen = Math.max(text.length(), maxLen);
	if (t instanceof EOF)
	  break;
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
    System.out.println("Max token length: " + maxLen);
  }

If the pushback reader's buffer is 3000 bytes, it works fine.  If I use a
2000 byte buffer, however, the lexer will eventually crash with a
PushbackReader overflow error when it does an "unread".  (See bottom of msg
for stack trace) Strangely, however, the max token length is well under 800
bytes.  And at the point where it does crash, it's not working with a
particularly big token.

For what it's worth, this is JDK 1.4 on a PC, both with and without the
performance pack.

Anyhow, here's the actual error message:

java.io.IOException: Pushback buffer overflow
        at java.lang.Throwable.<init>(Compiled Code)
        at java.lang.Exception.<init>(Compiled Code)
        at java.io.IOException.<init>(Compiled Code)
        at java.io.PushbackReader.unread(Compiled Code)
        at html.lexer.Lexer.pushBack(Compiled Code)
        at html.lexer.Lexer.getToken(Compiled Code)
        at html.lexer.Lexer.next(Compiled Code)
        at Test.main(Compiled Code)

-Nick Kramer