I have a parser for Excel functions that should handle function overloading based on the type of the parameters. Problem is that that number and string column types are based on external context so parser needs to select adequate function based on the context of those columns, for example:
input expression: IFNA(123, 234) -> correctly parsed as number_function
input expression: IFNA("foo", "bar") -> correctly parsed as string_function
but when using columns we have issues
columnsContext = {
column1: Type.String
column2: Type.String
}
input expression: IFNA(column1, column2)
Above should be parsed as a string function based on the column types but is recognized as number function since it was declared first in the grammar.
Grammar:
grammar ExcelLikeFunctionsGrammar;
expression : number | string;
number_function:
IFNA LEFT_PAREN number COMMA number RIGHT_PAREN |
... other functions;
string_function:
IFNA LEFT_PAREN string COMMA string RIGHT_PAREN |
... other functions;
number : NUMBER_CONSTANT | number_function | number_column ;
string : STRING_CONSTANT | string_function | string_column ;
number_column: ALPHANUMERIC;
string_column: ALPHANUMERIC;
I tried to handle that with semantic predicates and adding logic in custom parser but it throws noViableAlt exception when it goes to isNumber() since parser already recognized IFNA as a number function but columns are of string type so predicate returns false
Grammar:
grammar ExcelLikeFunctionsGrammar;
// this get override in the CustomParser
@members {
protected boolean isNumber() {
return true;
}
protected boolean isString() {
return true;
}
}
expression : number | string;
number_function:
IFNA LEFT_PAREN number COMMA number RIGHT_PAREN |
... other functions;
string_function:
IFNA LEFT_PAREN string COMMA string RIGHT_PAREN |
... other functions;
number : NUMBER_CONSTANT | number_function | {isNumber()}? number_column ;
string : STRING_CONSTANT | string_function | {isString()}? string_column ;
number_column: ALPHANUMERIC;
string_column: ALPHANUMERIC;
Parser:
public class ExcelLikeFunctionsGrammarCustomParser extends ExcelLikeFunctionsGrammarParser {
private final ReferenceContext referenceContext;
public ExcelLikeFunctionsGrammarCustomParser(TokenStream input, ReferenceContext referenceContext) {
super(input);
this.referenceContext = referenceContext;
}
@Override
protected final boolean isNumber() {
return checkColumnTokenType(TableColumnType.DECIMAL);
}
@Override
protected final boolean isString() {
return checkColumnTokenType(TableColumnType.STRING);
}
private boolean checkColumnTokenType(TableColumnType columnType) {
return checkTypeLogic(...);
}
I fixed this by resolving ambiguity at the lexical level by adding column tokens based on type. Then I built a custom lexer where I changed the type of the token based on the type of the reference in the
checkType()action.With this approach when it's time to parse the input, tokens are already typed so parser can go into the correct function.
Grammar
Custom Lexer