I have a CustomErrorListener that overrides syntaxError. When there is a syntax error with the last token (right before EOF), the syntaxError method will report EOF as the offending symbol.
This is my top level grammar rule:
program: statementSeparator* version statements statementSeparator* EOF;
Some program ending like:
h q[0, 4
Will report a problem right after 4 (where EOF sits). Although the real offending symbol, in my opinion, is q[0, 4, which is missing an ending ].
Is there something wrong with the grammar so that the token before
EOFis not reported as the offending symbol?In case the grammar is OK:
a) is there a proper way to check for the
EOFtoken?, andb) would it be possible to access the token before
EOF?
Lexer
lexer grammar CqasmLexer;
// White spaces and comments are skipped, i.e. not passed to the parser
WHITE_SPACE: [ \t]+ -> skip;
SINGLE_LINE_COMMENT: '//' ~[\r\n]* -> skip;
MULTI_LINE_COMMENT: '/*' .*? '*/' -> skip;
// Signs
NEW_LINE: '\r'?'\n';
SEMICOLON: ';';
COLON: ':';
COMMA: ',';
DOT: '.';
EQUALS: '=';
OPEN_BRACKET: '[';
CLOSE_BRACKET: ']';
OPEN_BRACE: '{';
CLOSE_BRACE: '}';
OPEN_PARENS: '(';
CLOSE_PARENS: ')';
PLUS: '+'; // this token is shared by UNARY_PLUS_OP and PLUS_OP
MINUS: '-'; // this token is shared by UNARY_MINUS_OP and MINUS_OP
// Operators
// UNARY_PLUS_OP: '+';
// UNARY_MINUS_OP: '-';
BITWISE_NOT_OP: '~';
LOGICAL_NOT_OP: '!';
POWER_OP: '**';
PRODUCT_OP: '*';
DIVISION_OP: '/';
MODULO_OP: '%';
// PLUS_OP: '+';
// MINUS_OP: '-';
SHL_OP: '<<';
SHR_OP: '>>';
CMP_GT_OP: '>';
CMP_LT_OP: '<';
CMP_GE_OP: '>=';
CMP_LE_OP: '<=';
CMP_EQ_OP: '==';
CMP_NE_OP: '!=';
BITWISE_AND_OP: '&';
BITWISE_XOR_OP: '^';
BITWISE_OR_OP: '|';
LOGICAL_AND_OP: '&&';
LOGICAL_XOR_OP: '^^';
LOGICAL_OR_OP: '||';
TERNARY_CONDITIONAL_OP: '?';
// Keywords
VERSION: 'version' -> pushMode(VERSION_STATEMENT);
MEASURE: 'measure';
QUBIT_TYPE: 'qubit';
BIT_TYPE: 'bit';
AXIS_TYPE: 'axis';
BOOL_TYPE: 'bool';
INT_TYPE: 'int';
FLOAT_TYPE: 'float';
// Numeric literals
BOOLEAN_LITERAL: 'true' | 'false';
INTEGER_LITERAL: Digit+;
FLOAT_LITERAL:
Digit+ '.' Digit+ Exponent?
| Digit+ '.' Exponent? // float literals can end with a dot
| '.' Digit+ Exponent?; // or just start with a dot
fragment Digit: [0-9];
fragment Exponent: [eE][-+]?Digit+;
// Identifier
IDENTIFIER: Letter (Letter | Digit)*;
fragment Letter: [a-zA-Z_];
// Version mode
//
// Whenever we encounter a 'version' token, we enter the Version mode
// Within the version mode, a sequence such as '3.0' will be treated as a version number, and not as a float literal
mode VERSION_STATEMENT;
VERSION_WHITESPACE: [ \t]+ -> skip;
VERSION_NUMBER: Digit+ ('.' Digit+)? -> popMode;
Parser
parser grammar CqasmParser;
options {
tokenVocab = CqasmLexer;
}
program: statementSeparator* version statements statementSeparator* EOF;
version: VERSION VERSION_NUMBER;
statements: (statementSeparator+ statement)*;
statementSeparator: NEW_LINE | SEMICOLON;
statement:
QUBIT_TYPE arraySizeDeclaration? IDENTIFIER # qubitTypeDeclaration
| BIT_TYPE arraySizeDeclaration? IDENTIFIER # bitTypeDeclaration
| AXIS_TYPE IDENTIFIER (EQUALS expression)? # axisTypeDeclaration
| BOOL_TYPE arraySizeDeclaration? IDENTIFIER (EQUALS expression)? # boolTypeDeclaration
| INT_TYPE arraySizeDeclaration? IDENTIFIER (EQUALS expression)? # intTypeDeclaration
| FLOAT_TYPE arraySizeDeclaration? IDENTIFIER (EQUALS expression)? # floatTypeDeclaration
| expression EQUALS MEASURE expression # measureInstruction
| IDENTIFIER expressionList # instruction
;
arraySizeDeclaration: OPEN_BRACKET INTEGER_LITERAL CLOSE_BRACKET;
expressionList: expression (COMMA expression)*;
indexList: indexEntry (COMMA indexEntry)*;
indexEntry:
expression # indexItem
| expression COLON expression # indexRange
;
expression:
OPEN_PARENS expression CLOSE_PARENS # parensExpression
| <assoc=right> (PLUS | MINUS) expression # unaryPlusMinusExpression
| <assoc=right> BITWISE_NOT_OP expression # bitwiseNotExpression
| <assoc=right> LOGICAL_NOT_OP expression # logicalNotExpression
| <assoc=right> expression POWER_OP expression # powerExpression
| expression (PRODUCT_OP | DIVISION_OP | MODULO_OP) expression # productExpression
| expression (PLUS | MINUS) expression # additionExpression
| expression (SHL_OP | SHR_OP) expression # shiftExpression
| expression (CMP_GT_OP | CMP_LT_OP | CMP_GE_OP | CMP_LE_OP) expression # comparisonExpression
| expression (CMP_EQ_OP | CMP_NE_OP) expression # equalityExpression
| expression BITWISE_AND_OP expression # bitwiseAndExpression
| expression BITWISE_XOR_OP expression # bitwiseXorExpression
| expression BITWISE_OR_OP expression # bitwiseOrExpression
| expression LOGICAL_AND_OP expression # logicalAndExpression
| expression LOGICAL_XOR_OP expression # logicalXorExpression
| expression LOGICAL_OR_OP expression # logicalOrExpression
| <assoc=right> expression TERNARY_CONDITIONAL_OP expression COLON expression # ternaryConditionalExpression
| IDENTIFIER OPEN_PARENS expressionList? CLOSE_PARENS # functionCall
| IDENTIFIER OPEN_BRACKET indexList CLOSE_BRACKET # index
| IDENTIFIER # identifier
| OPEN_BRACKET expression COMMA expression COMMA expression CLOSE_BRACKET # axisInitializationList
| OPEN_BRACE expressionList CLOSE_BRACE # initializationList
| BOOLEAN_LITERAL # booleanLiteral
| INTEGER_LITERAL # integerLiteral
| FLOAT_LITERAL # floatLiteral
;
The rule partial
fails on the missing
CLOSE_BRACKETtoken. Your input text matches up to that point, so Antlr's parser token index is pointing to the next token when concluding that it is not aCLOSE_BRACKETtoken (in your case, theEOFtoken). Hence, that is the 'correct' location of the evaluated error (as noted by @Joachim Sauer).Identifying the actual error source, whether in the lexer or parser, can be accomplished by adding an error strategy handler to the parser
and a recognizer error listener to both the parser and lexer
Helper
(A few, minor helper methods are missing from this code, but their function should be obvious.)