I'm new to Antlr and I'm trying to learn. I have a lexer with defined tokens. And another token that uses a subset of my tokens as so.
ADDQ: 'addq';
SUBQ: 'subq';
ANDQ: 'andq';
XORQ: 'xorq';
OP: (ADDQ | ANDQ | XORQ | SUBQ);
In my parser I have a rule called doOperation as so:
doOperation:
OP REGISTER COMMA REGISTER;
When I test the rule using Intellij's ANTLR plugin. With an example: subq %rax, %rcx. I get an error that says, "mismatched input at subq, expect OP". What is the correct way to do this?
You can use token rules inside of other token rules, but when you do, there should be additional text that's matched around it. Something like:
Given these rules the string "abc" would produce an
Atoken and "abcdef" would produce aBtoken.However when you define one rule as an alternative of other rules like you did, you end up with multiple lexical rules that could match the same input. When lexical rules overlap, ANTLR (just like the vast majority of lexer generators) will first pick the rule that would lead to the longest match and, in case of ties, pick the one that appears first in the grammar.
So given your rules, the input
addqwould produce anADDQtoken becauseADDQappears beforeOPin the grammar. Same forSUBQand the others. So there's no way anOPtoken would ever be generated.Since you said that you don't use
ADDQ,SUBQetc. in your parser rules, you can make them fragments instead of token rules. Fragments can be used in token rules, but aren't themselves tokens. So you'll never end up with aSUBQtoken becauseSUBQisn't a token - you could only getOPtokens. In fact you don't even have to give them names at all, you could just "inline" them into OP like this:Another option (one that you'd have to use if you were using
SUBQetc. directly) is to turnOPinto a parser rule instead of a token. That way the inputsubqwould still generate aSUBQtoken, but that would be okay because now theoprule would accept aSUBQtoken.