Following is a regex which matches a comma-separated list of numbers and allows only unique numbers to be input:
^(?!.*\b(\d+)\b.*\b\1\b)(\d)(,(\d))*$
For e.g it allows 1,2,3 but disallows 1,1 or 2,1,1.
Can anybody please explain in simple language how that works?
Where I am confused is the negative lookahead assertion. In the explanations available on web about it shows the syntax as following
The syntax is: X(?!Y), it means "search X, but only if not followed by Y".
Ref: https://javascript.info/regexp-lookahead-lookbehind#negative-lookahead
But in my shown regex it doesn't follow the above syntax. Then how it works?
When it matches 1,2,3 what matching process happens?
When it doesn't match 2,1,1 what matching process happens?
When 1,2,3 or 2,1,1 are matched are they first matched against following the part of regex (\d)(,(\d))* and then its match result is asserted against the negative lookahead part (?!.*\b(\d+)\b.*\b\1\b) or first the negative lookahead part runs and then the remaining part?
Also if I remove .* before and after \b in negative lookahead part then it also starts matching 1,1 or 2,1,1. So what's the significance of the removed .* in negative lookahead part.
Note: I want to use the regex in my Ruby code in case it is important to inform about.
Source references for the regex are
https://stackoverflow.com/a/45946721/936494
https://stackoverflow.com/a/45944821/936494
Thanks.
The negative lookahead
following a beginning-of-string anchor (
^), asserts, "It is not the case ((?!...)) that, after skipping zero or more characters (other than line terminators) (.*), there exists a string comprised of one or more digits (\d+), saved to capture group 1 ((\d+)), that is preceded and followed by a word boundary (\b), and that is followed by one or more characters (other than line terminators) (.*) that is followed by the contents of capture group 1 (\1), preceded and followed by a word boundary".Therefore, the negative lookahead fails for the string
'a 12, *34 *12'(because'12', preceded and followed by a word boundary, is repeated), whereas it succeeds for the string'a 12, *34 512'(although'12'is repeated, the second'12'is not preceded by word boundary). If the lookahead fails there can be no match so the rest of the regex is not evaluated. If it succeeds the regex engine continues to evaluate the rest of the regex.Since the regex begins with the start-of-string anchor (
^), the lookahead, if satisfied, won't move the regex's string pointer from the start of the string, in which case the regex effectively becomesThis satisfies, for example,
'1,2,3'.'1'would be saved to capture group 2 (capture group 1 was used in the lookahead),',2'would be saved to capture group 3,'2'would be saved to capture group 4,',3'would be saved to capture group 3 (overwriting',2') and'3'would be saved to capture group 4 (overwriting'2'). DemoIn view of the part of the regex that follows the negative lookahead we see that the negative lookahead could be tightened as follows1:
Demo
1. I've used the positive lookahead
(?=,|$)(rather than(?=,|\z)) at the link in order to test the regex against multiple strings.