I want to capture tokens in a text in the following pattern:
The First 2 characters are alphabets and necessary, ends with [A-Z] or [A-Z][0-9] this is optional anything can come in between.
example:
AA123123A1
AA123123A
AA123123123
i want to match and capture start with ([A-Z][A-Z]) in group 1, end with [A-Z] or [A-Z][0-9] in group 3 and everything else between then in group2
Example:
AA123123A1 => [AA,123123,A1]
AA123123A. => [AA,123123,A]
AA123123123 => [AA,123123123,'']
the following regex is working in python but not in postgres.
regex='^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))([A-Za-z][0-9]{0,1}){0,1}$'
In Postgressql
select regexp_matches('AA2311121A1',
'^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))(.*)$','x');
result:
{AA,2311121A1,""}
I am trying to explore why positive lookahead behavior is not the same as python, and how to take make positive lookahead in Postgres work in this case.
You can use
See the regex demo and a DB fiddle online:
Details:
^- start of string([A-Za-z]{2})- two ASCII letters(.*?)- Group 2: any zero or more chars as few as possible([A-Za-z][0-9]?)?- Group 3: an optional sequence of an ASCII letter and then an optional digit$- end of string.