Antlr parser choose incorrect rule before reaching end of the line

48 Views Asked by At

I'm trying to write grammar to some kind of assembler language but not using mnemonics.

I have some kind of registers, lets say:

a, b, c, d

And one special register, which keeps address in memory:

&e

Now I want to allow to assign values to them:

a = b
d = a
c = &e

a is also a special register (accumulator), so it can has some operations made only on it like:

a = a xor d

all of them has a on the left side and one of the all registers on the right side. I

My grammar:

grammar somename;
options {
    language = CSharp;
}
program: line* EOF;

line: statement (NEWLINE+ | EOF);

statement: aOperation | registerAssignment;

expression:
    or #orAssignment
    | xor #xorAssignment;


xor:
    XOR reg8;

reg: hl_read | REGISTER8;

aOperation: REG_A '=' REG_A expression;

registerAssignment: reg '=' reg;

REGISTER:
    REG_A
    | 'b'
    | 'c'
    | 'd';

e_read: E_READ;

REG_A: 'a';
OR: 'or';
XOR: 'xor';
E_READ: '&e';
WHITESPACE: (' ' | '\t')+ -> skip;
NEWLINE: ('\r'? '\n' | '\r');

Now I've got a problem, that parser always catch a line a = a xor b as a = b and next round of parser get b register and there is nothing on the right side and throws error An unhandled exception of type 'System.IndexOutOfRangeException' occurred in Program.dll: 'Index was outside the bounds of the array.' How can I fix this?

1

There are 1 best solutions below

1
Bart Kiers On BEST ANSWER

As mentioned in the comments by sepp2k: the lexer will never produce a REG_A token because the input 'a' would already be consumed by the REGISTER rule.

A solution would be to remove the REGISTER lexer rule and create a register parser rule:

register
 : REG_A
 | REG_B
 | REG_C
 | REG_D
 ;

REG_A: 'a';
REG_B: 'b';
REG_C: 'c';
REG_D: 'd';