I have started learning javacc parser recently. I was requested to write a parser in which a token takes numbers from 1 to many and another token which take numbers from 2 to many Therefore I came up with something like this:
TOKEN : {
<NUM1: <NUM> (<NUM>)* > //for one or more
<NUM2: (<NUM>)+> //for two or more
<NUM :(["0"-"9"])+>
// and in the function
void calc():
{}
{
(
(<NUM1>)+
(<NUM2>)+
)* <EOF>
}
However even if I pass a text value WITH no numbers, it is getting passed successfully. What am i doing wrong in this?
The JavaCC syntax for the lexical tokens allows you to have repetitions of elements enclosed in a scope
()followed by one of:In your case, you need two tokens:
You read that as:
NUM1matches from one to infinity number of digitsNUM2matches from two to infinity number of digitsThe lexical machinery in JavaCC consumes one character from the input character stream and attempts to recognize a token. The two automata are as follows:
The lexer progresses simultaneously in the both automata for the both tokens. After no more progress is possible the latest found token is recognized. If more the one token type is possible, then the one declared first is recognized. For this reason
NUM2is declared beforeNUM1. This means that for input1the tokenNUM2is not going to be recognized, because more then one digit is needed for it. In this caseNUM1is going to be the only one token type that matches this input. For input12both token types are going to accept it, butNUM2is going to be recognized, because its declared first. This means that if you order themNUM1first, thenNUM2you are never going to receiveNUM2token, becauseNUM1will always "win" with its highest precedence.To use them, you can have two parser functions like these:
You read that as:
match_one_to_many_numbersaccepts from one to infinity number of tokenNUM1, delimited by a space, and then the input stream have to end with theEOFsystem tokenmatch_two_to_many_numbersthe same, just from tokenNUM2Because both of the tokens accept infinite number of digits, you cannot have a sequence of these tokens without a delimiter that is not a digit.