I'm developing a Fortran parser using ANTLR4, adhering to the ISO Fortran Standard 2018 specifications. While implementing lexer rules, I encountered a conflict between the NAME and LETTERSPEC rules. Specifically, when the input consists of just a letter, it is always tokenized as NAME and never as LETTERSPEC. Here's a partial simplified version of the grammer:
lexer grammar FortrantTestLex;
// Lexer rules
WS: [ \t\r\n]+ -> skip;
// R603 name -> letter [alphanumeric-character]...
NAME: LETTER (ALPHANUMERICCHARACTER)*;
// R865 letter-spec -> letter [- letter]
LETTERSPEC: LETTER (MINUS LETTER)?;
MINUS: '-';
// R601 alphanumeric-character -> letter | digit | underscore
ALPHANUMERICCHARACTER: LETTER | DIGIT | UNDERSCORE;
// R0002 Letter ->
// A | B | C | D | E | F | G | H | I | J | K | L | M |
// N | O | P | Q | R | S | T | U | V | W | X | Y | Z
LETTER: 'A'..'Z' | 'a'..'z';
// R0001 Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
DIGIT: '0'..'9';
// R602 UNDERSCORE -> _
UNDERSCORE: '_';
grammer FortranTest;
import FortranTestLex;
// Parser rules
programName: NAME;
// R1402 program-stmt -> PROGRAM program-name
programStmt: PROGRAM programName;
letterSpecList: LETTERSPEC (COMMA LETTERSPEC)*;
// R864 implicit-spec -> declaration-type-spec ( letter-spec-list )
implicitSpec: declarationTypeSpec LPAREN letterSpecList RPAREN;
implicitSpecList: implicitSpec (COMMA implicitSpec)*;
// R863 implicit-stmt -> IMPLICIT implicit-spec-list | IMPLICIT NONE [( [implicit-name-spec-list] )]
implicitStmt:
IMPLICIT implicitSpecList
| IMPLICIT NONE ( LPAREN implicitNameSpecList? RPAREN )?;
// R1403 end-program-stmt -> END [PROGRAM [program-name]]
endProgramStmt: END (PROGRAM programName?)?;
// R1401 main-program ->
// [program-stmt] [specification-part] [execution-part]
// [internal-subprogram-part] end-program-stmt
mainProgram: programStmt? endProgramStmt;
//R502 program-unit -> main-program | external-subprogram | module | submodule | block-data
programUnit: mainProgram;
//R501 program -> program-unit [program-unit]...
program: programUnit (programUnit)*;
In this case, the tokenization always results in NAME even though it could also be a valid LETTERSPEC. How can I resolve this conflict in my lexer rules to ensure correct tokenization?
I've tried adjusting the order of the lexer rules and refining the patterns, but I haven't been able to achieve the desired behavior. Any insights or suggestions on how to properly handle this conflict would be greatly appreciated. Thank you!