Regex captures more than wanted

82 Views Asked by oli_vi_er At 27 September 2023 at 03:07

I want to remove references on Wikipedia with AutoWikiBrowser (.net regex flavor), an automatic editor that handles regexes, but I am facing a newbie problem with the tags.

For example, I want to remove all references containing example.com, e.g.

<ref>{{cite web|title=Bar|url=https://example.com/bar}}</ref>

I tried the basic regex <ref>.*?example.com.*?</ref> (replaced with nothing), but it also captures everything after the first <ref> tag encountered, e.g:

<ref>{{cite web|title=Foo|url=https://zzz.com/foo}}</ref> blah-blah <ref>{{cite web|title=Bar|url=https://example.com/bar}}</ref>

I tried lookarounds with the tags, but the issue is it is not capturing the tags.

I am sorry to ask such a simple question, but I have been searching for the last hour to no avail, I speak English quite fluently, but not when it comes to technical terms...

Original Q&A

There are 1 best solutions below

Nick On 27 September 2023 at 03:40 BEST ANSWER

You can use this regex, which will match a <ref> tag that includes example.com before the closing </ref>:

<ref>(?:(?!<\/ref>).)*example\.com.*?<\/ref>

This matches:

<ref> : the characters <ref>
(?:(?!<\/ref>).)* : any number of characters that do not start a closing </ref> tag (using a tempered greedy token)
example\.com : the characters example.com
.*? : a minimal number of characters
<\/ref> : the characters </ref>

Demo on regex101

Note dependent on your regex engine and its regex delimiters you may not need the \ before the / in </ref>

Regex captures more than wanted

There are 1 best solutions below

Related Questions in REGEX

Related Questions in TAGS

Related Questions in WIKIPEDIA

Trending Questions

Popular # Hahtags

Popular Questions