ANTLR 4 lexer rule to skip combination of backslash and newline?

42 Views Asked by At

WS : [ \t]+ -> skip ; // skip spaces, tabs Works well to ignore white space by preventing those characters from reaching the parser. I want to do the same thing with character pair of '/' and newline. That is, backslash-newline are removed like other white space allowing a single statement to have embedded newlines.

I am trying variations of ESC_NEWLINE : '\\'NEWLINE ; NEWLINE : '\r'? '\n'; WS : ([ \t]+ | ESC_NEWLINE) -> skip ; but this does not skip ESC_NEWLINE. I don't know what other approach to take.

1

There are 1 best solutions below

0
Bart Kiers On

When you do:

ESC_NEWLINE : '\\' NEWLINE;
NEWLINE     : '\r'? '\n';
WS          : ([ \t]+ | ESC_NEWLINE) -> skip;

the WS rule will never match a ESC_NEWLINE. Lexer rules are matched in the following way:

  • match as many characters as possible
  • if 2 (or more) rules match the same characters, let the rule defined first "win"

This means that the input \\n (slash + new line) will always be matched as a ESC_NEWLINE rule, never as a WS rule.

The solution: let ESC_NEWLINE skip itself:

ESC_NEWLINE : '\\' '\r'? '\n' -> skip;
WS          : [ \t]+ -> skip;

Or remove ESC_NEWLINE:

WS          : ([ \t]+ | '\\' '\r'? '\n') -> skip;