Antlr Input Matching both parsing rule and a Lexer

37 Views Asked by At

I am trying to write a grammar for parsing arithmetic expressions.

When I am trying to add support for Unary minus (-) I am facing the following issue:

The input is matching only when I give a space between, eg - 10.

If the input is -10 it matches the Word since its a valid Lexer too.

How do I change my grammar so that I can use unary operator without space.

grammar Arithmetic;

parse
 : expression EOF
 ;

expression
 : op=SUBTRACT exp=expression
 | left=expression op= EXPONENT right=expression
 | left=expression op= DIVIDE right=expression
 | left=expression op= MULTIPLY right=expression
 | left=expression op= MODULUS right=expression
 | left=expression op= ADD right=expression
 | left=expression op= SUBTRACT right=expression
 | INTEGER
 | WORD
 ;

ADD          : '+';
SUBTRACT     : '-' ;
MULTIPLY     : '*' ;
DIVIDE       : '/' ;
MODULUS      : '%' ;
EXPONENT     : '^' ;
INTEGER      : [0-9]+;
WORD         : (ALPHANUMERIC | '_' | '-' | '.' | SQ | DQ)+;
WS           : [ \r\t\u000C\n]+ -> skip;
ALPHANUMERIC : [a-zA-Z0-9];
SQ           : '\''.*? '\'';
DQ           : '"'.*? '"';

1

There are 1 best solutions below

5
Bart Kiers On

If the input is -10 it matches the Word since its a valid Lexer too.

How do I change my grammar so that I can use unary operator without space.

That is impossible. You'll need to change the lexer rule that matches a Word. ANTLR's lexer works in a very predictable way (regardless of what the parser is trying to match):

  1. find a lexer rule that consumes the most characters
  2. if 2 (or more) lexer rules match the same characters, let the one defined first "win"

So, that would make it clear that -10 will always become a Word token. No way around it.

So in that case I will need to remove - from my WORD definition?

Yes, at the very least. But the WORD rule is still fishy: it also matches SQ and DQ, making those tokens to be never created (check point 2 above). Ans as it stands, a WORD would also match ..........______________, which seems odd.