[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Handling an INCLUDE directive.

> I think the only way to maintain the file/line information would be to
> modify SableCC, so that the generated Parser.java class privides
> overridable hooks to do this.

I agree that some changes need to be made to SableCC directly.

I sort of hacked in the support by extending the generated Lexer class. I
override both getToken and filter (though in reality, I probably don't need
to override filter).

This has a couple of problems as is, though. First, it doesn't capture the
file information (though it does capture the line information). To capture
this information I need to change SableCC directly, as the LexerException
and ParseException don't have slots for something like filename. I was
debating on the best path for this.

Should the Lexer/Parser do a callback of somekind into Something that then
has the option of throwing an exception, or perhaps doing something else. I
don't know if the system is set up to allow an external entity to actually
modify the flow of tokens et al at the level where it currently throws error
exceptions. This may be in the direction of a more general error handling
(and recovery) system

Another thought is that the system could pass the actual token to the
exception constructor, then the user can specify the actual exception class
to call. Add to this an "extra info" slot in Node that is left null unless
the user sticks something into it. The user can populate this slot in the
filter method. Combine this with a custom exception, and that extends it
quite a bit, I think, with minimal changes to the system.

Another problem with my current system is that the lexer must "parse" the
token stream looking for a properly constructed INCLUDE statement. Hardly
overwhelming, but it seems kind of nutty to have "a parser in the lexer".

Mind you, I have no real lexer/parser experience. I gravitated towards
SableCC because I liked the seperation of the grammar from the code itself.
When I had problems with Sable, I started to look at others (Antlr, Cup,
etc), and just threw my hands up in disgust because (at a glance) there was
never a boundary between the grammar and the code, making it difficult to
follow. Now, that model may work well for those whom have cut their teeth on
LEX/YACC, but for a complete novice, it's very confusing.

Below is my little include lexer for study. Again, it's a first cut, but
seems to do the job. I've added a hack on top of a hack by printing out the
file name to help in fixing syntax errors.

I'll look over at SourceForge and try and get the latest version to work


Will Hartung

Here's the class:

import java.io.*;
import java.util.*;
import my.node.*;
import my.lexer.*;

public class myLexer extends Lexer
    private myLexer includeLexer = null;
    private PushbackReader lxrPbr = null;
    private boolean foundInclude = false;
    // tokenList should always point to an empty list if it's not
    // being used.
    private LinkedList tokenList = new LinkedList();
    private LinkedList includeList = null;
    private HashSet seenIncludes = new HashSet();
    private TStringLiteral incFileToken = null;
    private Class typeNeeded = null;
    private String incFileName = null;

    private void startUpIncludeLexer(String fileName)
        throws LexerException
        FileReader fr;
        try {
            fr = new FileReader(fileName);
        } catch (FileNotFoundException fe) {
            throw new LexerException("Cannot find INCLUDE file at " +
                token.getLine() + "," + token.getPos());
        incFileName = fileName;
        System.out.println("Entering " + incFileName);
        lxrPbr = new PushbackReader(fr);
        includeLexer = new myLexer(lxrPbr);

    private void shutDownIncludeLexer()
        throws IOException
        System.out.println("Exiting " + incFileName);
        incFileName = null;
        includeLexer = null;

    protected void filter() throws LexerException, IOException
        // We're checking to see if we've encountered an INCLUDE statment.
        // includeList tracks the tokens making up the INCLUDE
        // statement. If it exists, then we're in an INCLUDE
        // statement.
        if (includeList != null) {
            // Mini state machine to parse INCLUDE "filename" ;
            if (token.getClass() == typeNeeded) {
                if (token instanceof TStringLiteral) {
                    incFileToken = (TStringLiteral)token;
                    typeNeeded = new TSemicolon().getClass();
                } else {
                    if (token instanceof TSemicolon) {
                        String fileName = incFileToken.getText();
                        // Strip off the quotes.
                        fileName = fileName.substring(1, fileName.length() -
                        includeLexer = new myLexer(lxrPbr);
                        includeList = null;
                        typeNeeded = null;
            } else {
                if (!((token instanceof TBlank) || (token instanceof
TComment))) {
                    // It's not a blank or comment, and it's not part
                    // of the include list, so we send the entire
                    // stream to the parse. It may know something we
                    // don't.
                    tokenList = includeList;
                    includeList = null;
                } else {
                    //We skip, but keep, blanks and comments.
            token = null;
        } else {
            if (token instanceof TInclude) {
                // Since filter() is call for every token, we must
                // sure that we haven't seen the include token before
                // if we had just sent it back to the parser. This
                // assumes that the lexer does not reuse tokens.
                if (!seenIncludes.contains(token)) {
                    includeList = new LinkedList();
                    token = null;
                    // First thing after the INCLUDE must be a string.
                    typeNeeded = new TStringLiteral("").getClass();

    public myLexer(PushbackReader in)

    protected Token getToken() throws IOException, LexerException
        Token t;
        if (includeLexer != null) {
            // We seem to be in the middle of an include
            t = includeLexer.next();
            if (!(t instanceof EOF)) {
                // We're in the middle of an include stream,
                // so we just get the next included token.
                return t;
            } else {
                // End of the include, so shut down the include lexer.
        // If we're not including something, then we fall through to here.
        // Check if there's anything on the pushback queue.
        if (tokenList.size() != 0) {
            // It's not empty, so grab the one off the top.
            t = (Token) tokenList.removeFirst();
        } else {
            //Looks like we just need to get the token from this lexer.
            t = super.getToken();
        return t;