I am creating a syntax highlight file for a language and I have everything mapped out and working with one exception.
I cannot come up with a regex that will match the following conditions for a specific line comment style.
If the first non white-space character is an asterisk (*) the line is considered a comment.
I have created many samples that work in regexr but it never captures in vscode.
For example, regexr is cool with this:
^(?:\s*)\*+(?:.*)?\n
So I convert it into the proper format for the tmlanguage.json file:
^(?:\\s*)\\*+(?:.*)?\\n
But it is not capturing properly, if the first character of the line is an *, it does not catch, but if the first character is a whitespace character followed by an * it does work.
I suck at formatting on stackoverflow, so represents a chr(9) tab character. is a space.
*******************************
*****************************
<tab>*************************
* comment
* comment
<tab>* comment
But it shouldn't work in these cases:
string *******************************
string ***************************** string
<tab>string *************************
x *= 3
I am guessing that either the anchor ^ isn't working in my regex or I am escaping something incorrectly.
Any advice?
Please see sample image attached: screenshot
I don't know the regex engine you're using. I'm just going to give you some
general tips on how it should be done.
the anchor
^, in an engines default state means Beginning of String (BOS)What you want in this case is Multi-Line-Mode. This makes the anchor
^match at the Beginning of Line (BO L) as well as the BOS.Second, you don't need those non capture groups
(?:\s*)(?:.*), they encapsulate single constructs.Third, it is redundant to make a group optional when its enclosed contents are optional
(?:.*)?Fourth, you don't need the newline
\nconstruct at the end, since it should not be highlighted anyway, and it might not be present on the last line of text.The latter will make it not match.
So, putting it all together, the modified regex would be
(?m)^\s*\*.*Explained
Note that you could put a single capture group around the data
if you need to reference it in a replace
(?m)^(\s*\*.*)Also, the language you're using should have a way to specify options when compiling the regex. If the engine doesn't accept inline modifiers
(?m)take it out and specify that option when compiling the regex.