How can I properly set a group based on word list when those words may appear once or twice?

48 Views Asked by At

For context, I'm completely new to Go and I've never worked much with Regex.

In order to get more practical experience with both, I'm trying to write a converter that will convert the syntax of one language (Zephir) into another language (PHP), by recursively walking a directory of my choosing, opening each Zephir (.zep) file, making the necessary syntax modifications and then saving the modified content to a PHP (.php) file in an output directory.

So far the process has been relatively painless, but now I'm at a more complex Regex function that I'm having trouble writing the pattern for. I'm hoping that I can get some help with formatting my pattern correctly so that it targets what I need correctly.

Right now, I'm trying to find all of the property declarations in the class of the opened Zephir file. But because the keywords preceding the property name can either be one or two keywords, I need to be able to reliably set the keyword(s) as Group 1, along with what proceeds it to be set as Group 2. Below is a sample I'm trying to target and the desired result.

Sample:

protected static autoEscape = true;
private documentAppendTitles;

Desired Result:

Match: [Group 1: protected static] [Group 2: autoEscape = true;]
Match: [Group 1: private] [Group 2: documentAppendTitles;]

In order to achieve the matching and grouping I desire, I created the following Regex pattern: https://regex101.com/r/Tctf7s/1

While this does highlight and group the matches in the way I desire, it also has the unintended consequence of matching with every other line of code in the file. I think this is because my first group has a nested, non-capturing group, so if it's not finding a match for that pattern, the parent group itself can be anything, thereby triggering nearly every line as a match.

The problem is that I'm not sure what the proper pattern syntax is to ensure that I'm grouping these property keywords together into a single group, regardless if there is only one keyword or two.

My ultimate goal here is to get only two groups for every match, as Go doesn't support negative lookahead, so I need to check group 2 and ensure the property is not a method/function before attaching a "$" symbol in front of it for the syntax replacement.

I have a feeling that I may be just missing some sort of indicator in the Group 1 pattern to ensure it's not empty, but is there a better way to write this that I'm not aware of due to my lack of experience?

1

There are 1 best solutions below

0
Alex Pliutau On
  1. Using an alternation:

    (protected|private|public)\s+(static|auto)\s+(\w+)(?:\s*=\s*(.+?);)?

  2. Using a positive lookahead:

    (protected|private|public)\s+(?:static|auto)?\s+(\w+)(?=\s*=\s*|$)