I'm trying to write a regex that can match ONLY those lines where every white-space separated token occurs exactly twice, regardless of order.
For example, the entirety of the following lines should match:
1 1 2 2
100 10 10 100
A B B A
HELLO HELLO
The following lines should NOT match:
hello hello hello
1 1 22
1001
Although I'm able to match individual groups of repetitions in a given line, using the regex (\d+)(?=.*(\1)), I am struggling to match entire lines using ^$. My guess is that when I use the lookahead, this is creating an infinite loop where we constantly look at every token (including the repetitions) and expect a repeat later in the string, although I am not sure how to resolve this. Any ideas?
Thanks!
[EDIT]: To add some details, based on questions from the comments: Obviously it is fairly trivial to implement this as a function in most programming languages. However,I was originally looking to implement this as a regex, because I was trying to match certain records in a database. Thus, this regex was intended to be embedded as a CASE statement in a SQL query, which in my mind would've been a nice way make the selection.
Given the apparent complexity of such a regex, it seems that creating a function is the way to go, and thus pretty much any of the following answers would be fine solutions, circumstances depending.
You could create a function to split the string and count the occurrences of each word in it, returning
trueif all the words have a count of 2:Then you can simply call the function to test your string. For example:
Output:
Demo on dbfiddle.uk