Why is the regex quantifier {n,} more greedy than + (in Python)?

147 Views Asked by Anton Ganichev At 24 March 2023 at 19:39

I tried to use regexes for finding max-length sequences formed from repeated doubled letters, like AABB in the string xAAABBBBy.

As described in the official documentation:

The '*', '+', and '?' quantifiers are all greedy; they match as much text as possible.

When I use the quantifier {n,}, I get a full substring, but + returns only parts:

import re

print(re.findall("((AA|BB){3,})", "xAAABBBBy"))
# [('AABBBB', 'BB')]
print(re.findall("((AA|BB)+)", "xAAABBBBy"))
# [('AA', 'AA'), ('BBBB', 'BB')]

Why is {n,} more greedy than +?

Original Q&A

There are 1 best solutions below

Lover of Structure On 24 March 2023 at 20:44

Both quantifiers {3,} and + are greedy in the same way.

First, let's simplify the output a little bit by changing the inner group into a non-capturing one:

import re

print(re.findall("((?:AA|BB){3,})", "xAAABBBBy"))
# ['AABBBB']
print(re.findall("((?:AA|BB)+)", "xAAABBBBy"))
# ['AA', 'BBBB']

The first pattern requires a repetition (with the total number of occurrences being at least 3 – let's call this multiplicity ≥3), so the only possible match starts at the second A:

x A ⟨A A B B B B⟩ y

The second pattern requires only multiplicity ≥1. As the string is scanned left-to-right, the first (left-most) possible greedy match is formed by the first two As. For the remaining string ABBBBy, the first (left-most) possible greedy match is BBBB. After that, only y remains, which can't be matched.

x ⟨A A⟩ A ⟨B B B B⟩ y

Why is the regex quantifier {n,} more greedy than + (in Python)?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in REGEX-GREEDY

Trending Questions

Popular # Hahtags

Popular Questions