How do I match a character before or after a capturing group in regex?

553 Views Asked by At

I have a Python script with a regex pattern that searches for the word employee_id if there is an equals sign immediately before or after.

import re

pattern = r"(=employee_id|employee_id=)"

print(re.search(pattern, "=employee_id").group(1))  # =employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id=
print(re.search(pattern, "=employee_id=").group(1))  # =employee_id
print(re.search(pattern, "employee_id"))  # None
print(re.search(pattern, "employee_identity="))  # None

How can I modify my regex pattern to only capture the employee_id part of the string without the equals sign?

# Desired results
print(re.search(pattern, "=employee_id").group(1))  # employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id
print(re.search(pattern, "=employee_id=").group(1))  # employee_id
print(re.search(pattern, "employee_id"))  # None
print(re.search(pattern, "employee_identity="))  # None

I attempted to use capture groups, but putting parentheses around employee_id meant my results were split between two capture groups:

pattern = r"=(employee_id)|(employee_id)="
print(re.search(pattern, "employee_id=").group(1))  # None
print(re.search(pattern, "employee_id=").group(2))  # employee_id

Using optional groups would match an employee_id without any equals sign.

(?:=)?(employee_id)(?:=)?

I also do not want to exclude matches where the character is both before and after the word.

3

There are 3 best solutions below

5
Andrej Kesely On

Try:

(?<==)employee_id|employee_id(?==)

Regex demo.

Or if you want it matched inside a capture group

((?<==)employee_id|employee_id(?==))

Regex demo.

This matches employee_id if there is = before or after the string


EDIT: Python example:

import re

pattern = r"(?<==)employee_id|employee_id(?==)"

print(re.search(pattern, "=employee_id").group(0))  # =employee_id
print(re.search(pattern, "employee_id=").group(0))  # employee_id=
print(re.search(pattern, "=employee_id=").group(0))  # =employee_id

Prints:

employee_id
employee_id
employee_id

OR: You can add capturing group around the pattern:

You can put capturing group around the pattern:

import re

pattern = r"((?<==)employee_id|employee_id(?==))"

print(re.search(pattern, "=employee_id").group(1))  # =employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id=
print(re.search(pattern, "=employee_id=").group(1))  # =employee_id

Prints:

employee_id
employee_id
employee_id
4
anubhava On

If you want to have only one capture group while making sure = is either before or after the capture group then use:

(?:(?<==)|(?=\w+=))(employee_id)\b

RegEx Demo

RegEx Details:

  • (?:: Non capture group start
    • (?<==): Assert that we have = just before the current position
    • |: OR
    • (?=\w+=): Assert that we have 1+ word characters and = just after the current position
  • ): Non capture group end
  • (employee_id): Match and capture employee_id
  • \b: Word boundary
0
decorator-factory On

This is probably more complicated than it needs to be, but it is an option.

Python's re supports named groups, so one would hope that this works:

=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id>\d+)=

Unfortunately it doesn't, even though the groups won't ever collide.

error: redefinition of group name 'employee_id' as group 2; was group 1 at position 37

This does, however, work with the third-party regex package:

>>> import regex
>>> pattern = regex.compile(r"=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id>\d+)=")
>>> match = pattern.search("And he's like: id=42. So hilarious!")
>>> match
<regex.Match object; span=(17, 20), match='=42'>
>>> match.groupdict()
{'employee_id': '42'}

If you want to use re, you could use a helper function and a slight modification to the pattern:

def unify_groupdict(groupdict):
    result = {}
    for name, match in groupdict.items():
        name = name.rstrip("_")
        if result.get(name) is None:
            result[name] = match
    return result

###

pattern = re.compile(r"=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id_>\d+)=")
match = pattern.search("And he's like: id=42. So hilarious!")

print(unify_groupdict(match.groupdict()))
# {'employee_id': 42}