Conditional matching complex regex in capturing group

107 Views Asked by At

I am writing a regex for my git commit-msg hook and can't deal with the last part.

This is my regex

/^GRP\-[0-9]+\s(FIX|CHANGE|HOTFIX|FEATURE){1}\s(CORE|SHARED|ADM|CSR|BUS|OTHER){1}\s-\s.+/

My commit messages can have 2 variations.

  1. GRP-0888 FIX OTHER - (jest.config.js) : Fix testMatch option issue
  2. GRP-0888 FIX OTHER - Fix testMatch option issue

My current regex works well with both as it completes the check after -. So basically after the dash, it doesn't take care of checking the format.

I want it to check these 2 conditions and match them respectively.

  • If after the dash it meets ( then continue with that pattern and do all the checks

Check if the opening and closing brackets are there and after the closing bracket it has a space and : and again space and the rest of the commit description.It should match the 1st pattern.

  • If it meets an Alphanumeric character after the - then it matches the 2nd pattern

I have tried to use a disjunction in a capturing group but somehow it fails. Actually, I am guessing why it is failing as the second condition can always cover everything.

/^GRP\-[0-9]+\s(FIX|CHANGE|HOTFIX|FEATURE){1}\s(CORE|SHARED|ADM|CSR|BUS|OTHER){1}\s-\s(\(.+\)\s:\s.+|.+)/

UPDATED

These commit message patterns are invalid and shouldn't pass

  • GRP-0988 FIX CORE - (Some change)
  • GRP-0988 FIX CORE - (Some change) - Some description
  • GRP-0988 FIX CORE - (
  • GRP-0988 FIX CORE - ()
  • GRP-0988 FIX CORE - (Some change
  • GRP-0988 FIX CORE - Some change)
3

There are 3 best solutions below

5
Fractalism On

This is the regex after the dash: (?:(\(.*?\))(?:\s:\s))?(?!\(.*?\)(?:\s-\s)?)(.+)

  • (?:(\(.*?\))(?:\s:\s))? - Optionally match the previous two parts, without capturing
    • (\(.*?\)) - Match and capture the part within parentheses. Use the non-greedy quantifier *? to prevent leaking into the rest of the string if there is another ) after the part you want to match.
    • (?:\s:\s) - Match and discard the colon and surrounding spaces
  • (?!\(.*?\)(?:\s-\s)?) - Negative lookahead to ensure it does not match messages such as (Some change) and (Some change) - Some description
    • \(.*?\) - Match stuff within parentheses
    • (?:\s-\s)? - Optionally match a colon surrounded by spaces
  • (.+) - Match and capture the rest the commit message

let formats = [
  "GRP-0888 FIX OTHER - (jest.config.js) : Fix testMatch option issue",
  "GRP-0888 FIX OTHER - Fix testMatch option issue",
  "GRP-0900 FIX CORE - (Some change) - Some change. If there are (Some text)",
  "GRP-0988 FIX CORE - (Some change)",
  "GRP-0988 FIX CORE - (Some change) - Some description",
]

let regex = /^GRP\-[0-9]+\s(FIX|CHANGE|HOTFIX|FEATURE)\s(CORE|SHARED|ADM|CSR|BUS|OTHER)\s-\s(?:(\(.*?\))(?:\s:\s))?(?!\(.*?\)(?:\s-\s)?)(.+)/

for (let format of formats) {
  console.log(format.match(regex))
}

3
The fourth bird On

You might use:

^GRP-[0-9]+\s(FIX|CHANGE|HOTFIX|FEATURE)\s(CORE|SHARED|ADM|CSR|BUS|OTHER)\s-\s(?=[^a-zA-Z0-9]*[a-zA-Z0-9])(?:\([^()]*\)\s:\s)?[^()]*$

Explanation

  • ^ Start of string
  • GRP-[0-9]+\s Match GRP- 1+ digits and a whitespace char
  • (FIX|CHANGE|HOTFIX|FEATURE) Capture one of the alternatives in group 1
  • \s Match a single whitespace char
  • (CORE|SHARED|ADM|CSR|BUS|OTHER) Capture one of the alternatives in group 2
  • \s-\s Match - between 2 whitespace chars
  • (?=[^a-zA-Z0-9]*[a-zA-Z0-9]) Positive lookahead, assert an alphanumeric to the right
  • (?:\([^()]*\)\s:\s)? Optionally match (...) followed by :
  • [^()]* Match optional chars other than ( or )
  • $ End of string

See a regex101 demo

const regex = /^GRP-[0-9]+\s(FIX|CHANGE|HOTFIX|FEATURE)\s(CORE|SHARED|ADM|CSR|BUS|OTHER)\s-\s(?=[^a-zA-Z0-9]*[a-zA-Z0-9])(?:\([^()]*\)\s:\s)?[^()]*$/;
[
  "GRP-0888 FIX OTHER - (jest.config.js) : Fix testMatch option issue",
  "GRP-0888 FIX OTHER - Fix testMatch option issue",
  "GRP-0988 FIX CORE - (Some change)",
  "GRP-0988 FIX CORE - (Some change) - Some description",
  "GRP-0988 FIX CORE - (",
  "GRP-0988 FIX CORE - ()",
  "GRP-0988 FIX CORE - (Some change",
  "GRP-0988 FIX CORE - Some change)"
].forEach(s =>
  console.log(`${regex.test(s)} ---> ${s}`)
)

1
Peter Seliger On

Both {1}-quantifiers, each following a grouped alternation, are not necessary at all.

And as for the only 2 patterns ... either (<fileName>) : <fileChangeMessage> or <changeMessage> which are allowed to follow the OP's opening sequence of ... GRP-<version> <type> <target> - ... one has to precisely target this alternation which ...

  • ... either is a parentheses-free character sequence enclosed by parentheses followed by an whitespace enclosed colon followed by at least another character ... (\(.*\)\s\:\s.{1,})...

  • ... or is a parentheses-free character sequence all to the end of the line ... ([^()]+$).

Therefore something like ^GRP ... \s-\s((\(.*\)\s\:\s.{1,})|([^()]+$))/ is well suited for matching only the allowed lines from the examples provided by the OP, which are ...

GRP-0988 FIX CORE - (Some change)
GRP-0988 FIX CORE - (Some change) - Some description
GRP-0888 FIX OTHER - (jest.config.js) : Fix testMatch option issue
GRP-0988 FIX CORE - (
GRP-0988 FIX CORE - ()
GRP-0888 FIX OTHER - Fix testMatch option issue
GRP-0988 FIX CORE - (Some change
GRP-0988 FIX CORE - Some change)

The above linked shortened regex in its entirety looks like this ...

/^GRP-[0-9]+\s(FIX|CHANGE|HOTFIX|FEATURE)\s(CORE|SHARED|ADM|CSR|BUS|OTHER)\s-\s((\(.*\)\s\:\s.{1,})|([^()]+$))/gm

... and in case one wants to also capture the above named data in detail, one could make use of named capturing groups, as well as of matchAll and mapping, all based on the following pattern ... ^GRP-(?<version>[0-9]+)\s(?<type>...)\s(?<target>...)\s-\s(?:(?:(?<file>\(.*\))\s\:\s(?<fileChangeMessage>.{1,}))|(?<changeMessage>[^()]+$)).

The above linked shortened regex in its entirety looks like this ...

/^GRP-(?<version>[0-9]+)\s(?<type>FIX|CHANGE|HOTFIX|FEATURE)\s(?<target>CORE|SHARED|ADM|CSR|BUS|OTHER)\s-\s(?:(?:(?<file>\(.*\))\s\:\s(?<fileChangeMessage>.{1,}))|(?<changeMessage>[^()]+$))/gm

Both regular expression placed into some example code then leads to ...

const multilineSampleData =
`GRP-0988 FIX CORE - (Some change)
GRP-0988 FIX CORE - (Some change) - Some description
GRP-0888 FIX OTHER - (jest.config.js) : Fix testMatch option issue
GRP-0988 FIX CORE - (
GRP-0988 FIX CORE - ()
GRP-0888 FIX OTHER - Fix testMatch option issue
GRP-0988 FIX CORE - (Some change
GRP-0988 FIX CORE - Some change)`;

const regXMatch =
  /^GRP-[0-9]+\s(FIX|CHANGE|HOTFIX|FEATURE)\s(CORE|SHARED|ADM|CSR|BUS|OTHER)\s-\s((\(.*\)\s\:\s.{1,})|([^()]+$))/gm;

const regXNamedGroups =
  /^GRP-(?<version>[0-9]+)\s(?<type>FIX|CHANGE|HOTFIX|FEATURE)\s(?<target>CORE|SHARED|ADM|CSR|BUS|OTHER)\s-\s(?:(?:(?<file>\(.*\))\s\:\s(?<fileChangeMessage>.{1,}))|(?<changeMessage>[^()]+$))/gm;

console.log(
  'multiline sample data (just two expected matches) ...\n',
  multilineSampleData
);
console.log(
  'matching lines only ...',
  multilineSampleData
    .match(regXMatch)
);
console.log(
  'existing group properties only of each matching line ...',
  [
    ...multilineSampleData
      .matchAll(regXNamedGroups)
  ]
  .map(({ groups: { file, fileChangeMessage, changeMessage, ...rest } }) => ({
      ...rest,
      ...(file && { file, fileChangeMessage } || { changeMessage }),
    }))    
);
.as-console-wrapper { min-height: 100%!important; top: 0; }