Background
I want to identify text between a starting and an ending delimiter, but I want to have also the different text in between. You can see this example on regexr.com.
Current solution
Input
Text:
"aaa_s_123abc_e_bbbccc_s_456def_e_bbbddd_s_7890_e_wwwddd"
Pattern:
"/(.*?)(_s_.*?_e_)(.*?)/"
Result
0: aaa_s_123abc_e_
1: aaa
2: _s_123abc_e_
3:
------------------
0: bbbccc_s_456def_e_
1: bbbccc
2: _s_456def_e_
3:
------------------
0: bbbddd_s_7890_e_
1: bbbddd
2: _s_7890_e_
3:
------------------
Problem
I am missing the string "wwwddd" at the end.
Question
Why is group 3 empty? How do i get the text after the ending delimiter "e"?
Any idea how to update the pattern?
Because the empty string is a match for it, and there is no pattern to match after that group, so the empty string suffices to have the regex succeed. Be aware that
?is lazy. If you would have dropped that last?, making the.*greedy, the third group would contain all remaining characters in that line. Also that would not be what you wanted, because then it captures too much, even all other_s_and_e_.By:
preg_match_all; and_s_or by the end of the input ($).Drop the third capture group, as you want successive matches to be captured by the first capture group.
Proposed regex:
(.*?)(?:_s_.*?_e_|$)