How do I match a character before or after a capturing group in regex?

Question

How do I match a character before or after a capturing group in regex?

553 Views Asked by Stevoisiak At 28 February 2023 at 19:26

I have a Python script with a regex pattern that searches for the word employee_id if there is an equals sign immediately before or after.

import re

pattern = r"(=employee_id|employee_id=)"

print(re.search(pattern, "=employee_id").group(1))  # =employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id=
print(re.search(pattern, "=employee_id=").group(1))  # =employee_id
print(re.search(pattern, "employee_id"))  # None
print(re.search(pattern, "employee_identity="))  # None

How can I modify my regex pattern to only capture the employee_id part of the string without the equals sign?

# Desired results
print(re.search(pattern, "=employee_id").group(1))  # employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id
print(re.search(pattern, "=employee_id=").group(1))  # employee_id
print(re.search(pattern, "employee_id"))  # None
print(re.search(pattern, "employee_identity="))  # None

I attempted to use capture groups, but putting parentheses around employee_id meant my results were split between two capture groups:

pattern = r"=(employee_id)|(employee_id)="
print(re.search(pattern, "employee_id=").group(1))  # None
print(re.search(pattern, "employee_id=").group(2))  # employee_id

Using optional groups would match an employee_id without any equals sign.

(?:=)?(employee_id)(?:=)?

I also do not want to exclude matches where the character is both before and after the word.

Original Q&A

There are 3 best solutions below

**Andrej Kesely** · Answer 1 · 2023-02-28T19:31:44.477000

Try:

(?<==)employee_id|employee_id(?==)

Regex demo.

Or if you want it matched inside a capture group

((?<==)employee_id|employee_id(?==))

Regex demo.

This matches employee_id if there is = before or after the string

EDIT: Python example:

import re

pattern = r"(?<==)employee_id|employee_id(?==)"

print(re.search(pattern, "=employee_id").group(0))  # =employee_id
print(re.search(pattern, "employee_id=").group(0))  # employee_id=
print(re.search(pattern, "=employee_id=").group(0))  # =employee_id

Prints:

employee_id
employee_id
employee_id

OR: You can add capturing group around the pattern:

You can put capturing group around the pattern:

import re

pattern = r"((?<==)employee_id|employee_id(?==))"

print(re.search(pattern, "=employee_id").group(1))  # =employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id=
print(re.search(pattern, "=employee_id=").group(1))  # =employee_id

Prints:

employee_id
employee_id
employee_id

**anubhava** · Answer 2 · 2023-02-28T19:44:05.273000

If you want to have only one capture group while making sure = is either before or after the capture group then use:

(?:(?<==)|(?=\w+=))(employee_id)\b

RegEx Demo

RegEx Details:

(?:: Non capture group start
- (?<==): Assert that we have = just before the current position
- |: OR
- (?=\w+=): Assert that we have 1+ word characters and = just after the current position
): Non capture group end
(employee_id): Match and capture employee_id
\b: Word boundary

**decorator-factory** · Answer 3 · 2023-02-28T20:11:49.247000

This is probably more complicated than it needs to be, but it is an option.

Python's re supports named groups, so one would hope that this works:

=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id>\d+)=

Unfortunately it doesn't, even though the groups won't ever collide.

error: redefinition of group name 'employee_id' as group 2; was group 1 at position 37

This does, however, work with the third-party regex package:

>>> import regex
>>> pattern = regex.compile(r"=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id>\d+)=")
>>> match = pattern.search("And he's like: id=42. So hilarious!")
>>> match
<regex.Match object; span=(17, 20), match='=42'>
>>> match.groupdict()
{'employee_id': '42'}

If you want to use re, you could use a helper function and a slight modification to the pattern:

def unify_groupdict(groupdict):
    result = {}
    for name, match in groupdict.items():
        name = name.rstrip("_")
        if result.get(name) is None:
            result[name] = match
    return result

###

pattern = re.compile(r"=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id_>\d+)=")
match = pattern.search("And he's like: id=42. So hilarious!")

print(unify_groupdict(match.groupdict()))
# {'employee_id': 42}

How do I match a character before or after a capturing group in regex?

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in PYTHON-RE

Related Questions in CAPTURE-GROUP

Trending Questions

Popular # Hahtags

Popular Questions