I am playing around with FsLex and FsYacc, which is based off of ocamllex and ocamlyacc. What is the best way to define a comment in a language? Do I create a comment token in my lex file? There are a few complications to to comments that I cannot wrap my head around in the context of a grammar:
- A comment can be placed literally anywhere in the grammar and should be ignored.
- A comment can have literally anything in it including other tokens and invalid code.
- Comments can span many lines, and I need to maintain the source code position for the debugger. In FsLex and ocamllex, this has to be done by the language developer.
Since you include the
ocamltag I'll answer forocamllex.It's true that handling comments is difficult, especially if your language wants to be able to comment out sections of code. In this case, the comment lexer has to look for (a reduced set of) tokens inside comments, so as not to be fooled by comment closures appearing in quoted context. It also means that the lexer should follow the nesting of comments, so commented-out comments don't confuse things.
The OCaml compiler itself is an example of this approach. Comment handling for the OCaml compiler has three parts. The first-level lexing rule looks like this:
The second level consists of the function
handle_lexical_errorand the functioncomment. The former evaluates a lexing function while catching a specific exception. The latter is the detailed lexing function for comments. After the lexing of the comment, the code above goes back to regular lexing (withmain lexbuf).The function
commentlooks like this:So, yes, it's pretty complicated to do a good job.
For the last point,
ocamllextracks source code positions for you automatically. You can retrieve them from the lexbuf. See the OCamlLexingmodule. (However, note that the comment lexing function above adjusts the position when it lexes a newline. Theincr_locfunction increments the tracked line number.)I'm not sure how closely F# tracks this design, but hopefully this will be helpful.
Update
Here is the
stringlexing function:If you want to know more, you can find the full OCaml lexer source here: lexer.mll.