ANTLR4 parsing overloading based on the type of parameters

101 Views Asked by At

I have a parser for Excel functions that should handle function overloading based on the type of the parameters. Problem is that that number and string column types are based on external context so parser needs to select adequate function based on the context of those columns, for example:

input expression: IFNA(123, 234) -> correctly parsed as number_function
input expression: IFNA("foo", "bar") -> correctly parsed as string_function

but when using columns we have issues

columnsContext = {
   column1: Type.String
   column2: Type.String
}


input expression: IFNA(column1, column2)

Above should be parsed as a string function based on the column types but is recognized as number function since it was declared first in the grammar.

Grammar:

grammar ExcelLikeFunctionsGrammar;


expression : number | string;

number_function:
    IFNA LEFT_PAREN number COMMA number RIGHT_PAREN |
    ... other functions;

string_function:
    IFNA LEFT_PAREN string COMMA string RIGHT_PAREN |
    ... other functions;


number : NUMBER_CONSTANT | number_function | number_column ;
string : STRING_CONSTANT | string_function | string_column ;


number_column: ALPHANUMERIC;
string_column: ALPHANUMERIC;

I tried to handle that with semantic predicates and adding logic in custom parser but it throws noViableAlt exception when it goes to isNumber() since parser already recognized IFNA as a number function but columns are of string type so predicate returns false

Grammar:

grammar ExcelLikeFunctionsGrammar;
// this get override in the CustomParser
    @members {
        protected boolean isNumber() {
            return true;
        }
        protected boolean isString() {
            return true;
        }
    }

expression : number | string;

number_function:
    IFNA LEFT_PAREN number COMMA number RIGHT_PAREN |
    ... other functions;

string_function:
    IFNA LEFT_PAREN string COMMA string RIGHT_PAREN |
    ... other functions;


number : NUMBER_CONSTANT | number_function | {isNumber()}? number_column ;
string : STRING_CONSTANT | string_function | {isString()}? string_column ;


number_column: ALPHANUMERIC;
string_column: ALPHANUMERIC;

Parser:

public class ExcelLikeFunctionsGrammarCustomParser extends ExcelLikeFunctionsGrammarParser {
    private final ReferenceContext referenceContext;

    public ExcelLikeFunctionsGrammarCustomParser(TokenStream input, ReferenceContext referenceContext) {
        super(input);
        this.referenceContext = referenceContext;
    }

    @Override
    protected final boolean isNumber() {
        return checkColumnTokenType(TableColumnType.DECIMAL);
    }

    @Override
    protected final boolean isString() {
        return checkColumnTokenType(TableColumnType.STRING);
    }

    private boolean checkColumnTokenType(TableColumnType columnType) {
        return checkTypeLogic(...);
}

1

There are 1 best solutions below

0
Daniel Strausz On

I fixed this by resolving ambiguity at the lexical level by adding column tokens based on type. Then I built a custom lexer where I changed the type of the token based on the type of the reference in the checkType() action.

With this approach when it's time to parse the input, tokens are already typed so parser can go into the correct function.

Grammar

grammar ExcelLikeFunctionsGrammar;

@lexer::members {
    protected void checkType(String text) {}
}

expression : number | string;

number_function:
    IFNA LEFT_PAREN number COMMA number RIGHT_PAREN |
    ... other functions;

string_function:
    IFNA LEFT_PAREN string COMMA string RIGHT_PAREN |
    ... other functions;


number : NUMBER_CONSTANT | number_function | number_column ;
string : STRING_CONSTANT | string_function | string_column ;


number_column: NUMBER_COLUMN;
string_column: STRING_COLUMN;

// lexer
ALPHANUMERIC : (LOWERCASE | UPPERCASE | DIGIT | UNDERSCORE) (LETTER | ' ')* (LOWERCASE | UPPERCASE | DIGIT | UNDERSCORE) {checkType(getText());} ;
NUMBER_COLUMN : ALPHANUMERIC ;
STRING_COLUMN : ALPHANUMERIC ;

Custom Lexer

public class ExcelLikeFunctionsGrammarCustomLexer extends ExcelLikeFunctionsGrammarLexer {
    private final ReferenceContext referenceContext;

    public ExcelLikeFunctionsGrammarCustomLexer(CharStream input, ReferenceContext referenceContext) {
        super(input);
        this.referenceContext = referenceContext;
    }

    @Override
    protected void checkType(String text) {
        final var column = referenceContext.getCurrentSchema().getColumns().stream().filter(c -> c.getName().equals(text)).toList();
        if (!column.isEmpty()) {
            switch (column.get(0).getType()) {
                case DECIMAL, BIGINT -> setType(ExcelLikeFunctionsGrammarParser.NUMBER_COLUMN);
                case STRING -> setType(ExcelLikeFunctionsGrammarParser.STRING_COLUMN);
                case BOOLEAN -> setType(ExcelLikeFunctionsGrammarParser.LOGICAL_COLUMN);
                case DATETIME -> setType(ExcelLikeFunctionsGrammarParser.DATE_COLUMN);
                default -> setType(ExcelLikeFunctionsGrammarParser.ALPHANUMERIC);
            }
        } else {
            setType(ExcelLikeFunctionsGrammarParser.ALPHANUMERIC);
        }
    }
}