Best way to use re.sub with a different behavior when first called

31 Views Asked by At

I'm trying to perform a number of replacements using re.sub(), except I want the first replacement to be different. One straightforward approach would be to run re.sub() twice with count = 1 for the first call, but because re.sub() allows for the repl argument to be a function, we can do this in a single call:

import re

def repl(matchobj):
    global first_sub
    if first_sub:
        first_sub = False
        print(f"Replacing '{matchobj.group()}' at {matchobj.start()} with ':)'")
        return ":)"
    else:
        print(f"Deleting '{matchobj.group()}' at {matchobj.start()}")
        return ""

text = "hello123 world456"
first_sub = True
text = re.sub(r"\d+", repl, text)

# Output:
#   Replacing '123' at 5 with ':)'
#   Deleting '456' at 14

Unfortunately, this makes use of global, which isn't great. Is there a better way to do this?

2

There are 2 best solutions below

3
no comment On BEST ANSWER

With an iterator, inspired by Andrej:

import re

text = "hello123 world456"
text = re.sub(
    r"\d+",
    lambda _, i=iter([":)"]): next(i, ""),
    text
)
print(text)

Attempt This Online!

Or using a dict for the state:

import re

text = "hello123 world456"
text = re.sub(
    r"\d+",
    lambda m, d={0: ":)"}: d.pop(0, ""),
    text
)
print(text)

Attempt This Online!

Or one like yours but with a closure:

import re

def repl():
    first_sub = True
    def repl(matchobj):
        nonlocal first_sub
        if first_sub:
            first_sub = False
            print(f"Replacing '{matchobj.group()}' at {matchobj.start()} with ':)'")
            return ":)"
        else:
            print(f"Deleting '{matchobj.group()}' at {matchobj.start()}")
            return ""
    return repl

text = "hello123 world456"
text = re.sub(r"\d+", repl(), text)
print(text)

Attempt This Online!

3
Andrej Kesely On

One possible approach is to use itertools for the task, for example:

import re
from itertools import chain, cycle

text = "hello123 world456 world456 world456"

to_sub = chain([":)"], cycle([""]))

text = re.sub(r"\d+", lambda g: next(to_sub), text)
print(text)

Prints:

hello:) world world world

OR: If you don't want to use global variable:

text = re.sub(r"\d+", lambda _, i=chain([":)"], cycle([""])): next(i), text)

EDIT: As @no_comment states in the comment, you can use also itertools.repeat:

chain(repeat(":)", 1), repeat(""))