How to parse C++ comments with lark?

1.4k Views Asked by At

How can I write a rule to parse C++ comments either on a line alone or after other code?

I've tried lots of combinations, the latest one being:

?comment: "//" /[^\n]*/ NEWLINE
3

There are 3 best solutions below

0
João M. S. Silva On

Using: ?comment: /\/\/[^\n]*/

Then I had to handle the comment as a lark.lexer.Token.

1
Erez On

You had the right idea, but you should define comments as a single terminal (i.e. not a structure), for performance, and also so you can ignore them.

COMMENT: "//" /[^\n]*/ NEWLINE

%ignore COMMENT

Example grammar:

from lark import Lark

g = r"""
!start: "hello"

COMMENT: "//" /[^\n]*/ _NEWLINE
_NEWLINE: "\n"
%ignore COMMENT
%ignore " "
"""

parser = Lark(g)
print(parser.parse("hello // World \n"))
0
superbox On

You simply define a terminal and then ignore it:

COMMENT : /\/\// /.*/
        | /\/\*/ /.*/ /\*\//

%ignore COMMENT

NOTE: This will work only if you'll ignore all whitespace

%import common.WS
%ignore WS