How to handle "Regular expression backtrack stack overflow. (U_REGEX_STACK_OVERFLOW)"?

Question

How to handle "Regular expression backtrack stack overflow. (U_REGEX_STACK_OVERFLOW)"?

232 Views Asked by talocodat At 13 December 2022 at 18:12

I have a text from which I want to extract the first two paragraphs. The text consists of several paragraphs seperated by empty lines. The paragraphs themselves can contain line breaks. What I want to extract is everything from the beginning of the text until the second empty line. This is the original text:

Today I meet my friends in Kyiv to celebrate my new permanent residency status in Ukraine.
Then I went to a nice restaurant with them.

Buy me a Beer: https://www.buymeacoffee.com/johnnyfd

Support the GoFundMe: http://gofundme.com/f/send-money-dire...

Follow Me:

The text I want to have is:

Today I meet my friends in Kyiv to celebrate my new permanent residency status in Ukraine.
Then I went to a nice restaurant with them.

Buy me a Beer: https://www.buymeacoffee.com/johnnyfd

I tried to create a regular expression doing the job and I though the following seemed to be a possible solution:

(.*|\n)*(?:[[:blank:]]*\n){2,}(.*|\n)*(?:[[:blank:]]*\n){2,}

When I use it in R in stri_extract_all_regex, I receive the following error:

Error in stri_extract_all_regex(video_desc_orig, "(.*|\n)*?(?:[[:blank:]]*\n){2,}(.*?|\n)*(?:[[:blank:]]*\n){2,}") : 
  Regular expression backtrack stack overflow. (U_REGEX_STACK_OVERFLOW)

It's the first time for me using Regex and I really don't know how to interpret this error. Any help appreciated ;)

Original Q&A

There are 2 best solutions below

Bensstats On 13 December 2022 at 18:21

In R you need to do double slashes \\.

string <- 'Today I meet my friends in Kyiv to celebrate my new permanent residency status in Ukraine.
Then I went to a nice restaurant with them.

Buy me a Beer: https://www.buymeacoffee.com/johnnyfd

Support the GoFundMe: http://gofundme.com/f/send-money-dire...

Follow Me: '

library(stringr)

string |>
str_extract('(.*|\\n)*(?:[[:blank:]]*\\n){2,}(.*|\\n)*(?:[[:blank:]]*\\n){2,}') |>
cat()

# Output
Today I meet my friends in Kyiv to celebrate my new permanent residency status in Ukraine.
Then I went to a nice restaurant with them.

Buy me a Beer: https://www.buymeacoffee.com/johnnyfd

**The fourth bird** · Accepted Answer · 2022-12-13T18:46:18.597000

You have nested quantifiers like (.*|\n)* which creates a lot of paths to explore. This pattern for example first matches all text, and then starts to backtrack to fit in the next parts of the pattern.

Including the last 2 newlines, making sure that the lines contain at least a single non whitespace character:

\A[^\S\n]*\S.*(?:\n[^\S\n]*\S.*)*\n{2,}[^\S\n]*\S.*(?:\n[^\S\n]*\S.*)*

Explanation

\A Start of string
[^\S\n]*\S.* Match a whole line with at least a single non whitespace char
(?:\n[^\S\n]*\S.*)* Optionally repeat all following lines that contain at least a single non whitespace chars
\n{2,} Match 2 or more newlines
[^\S\n]*\S.*(?:\n[^\S\n]*\S.*)* Same as the previous pattern to match the lines for the second paragraph

See a regex demo and a R demo.

Example

library(stringi)

string <- 'Today I meet my friends in Kyiv to celebrate my new permanent residency status in Ukraine.
Then I went to a nice restaurant with them.

Buy me a Beer: https://www.buymeacoffee.com/johnnyfd

Support the GoFundMe: http://gofundme.com/f/send-money-dire...

Follow Me: '


stri_extract_all_regex(
  string,
  '\\A[^\\S\\n]*\\S.*(?:\\n[^\\S\\n]*\\S.*)*\\n{2,}[^\\S\\n]*\\S.*(?:\\n[^\\S\\n]*\\S.*)*'
)

Output

[[1]]
[1] "Today I meet my friends in Kyiv to celebrate my new permanent residency status in Ukraine.\nThen I went to a nice restaurant with them.\n\nBuy me a Beer: https://www.buymeacoffee.com/johnnyfd"

How to handle "Regular expression backtrack stack overflow. (U_REGEX_STACK_OVERFLOW)"?

There are 2 best solutions below

Related Questions in R

Related Questions in REGEX

Related Questions in BACKTRACKING

Related Questions in STRINGI

Trending Questions

Popular # Hahtags

Popular Questions