I have 2 columns A & B in a SQL Server table that contain a string aggregated list of codes.
The codes in the lists are separated by a semicolon.
The string aggregated lists can take any length and the codes can be in any order.
What I would like to do is to create a new column C which contains the codes that appear somewhere in the same row of the lists of Column A and Column B.
Example:
| Column_A | Column_B | Column_C |
|---|---|---|
| a;b;c;d;e | c;a;e;i;k | c;a;e |
| d;e;f;g | e;h;i;j;d | e;d |
The example above returns "c", "a" and "e" for the first row of column C, because these codes are present in the same row of both Column A and Column B.
The same for the second row, here "e" and "d" are overlapping in Column A and B and thus returned in Column C.
I have tried something that works, but it does not seem like the best solution in terms of efficiency and performance. Especially because I have many (1m+) rows to check this for and the length of code lists to compare can be very long.
SELECT
STRING_AGG(CAST([value] AS NVARCHAR(MAX)),'; ') AS Overlapping_Code
FROM
(SELECT a.value
FROM MyTable t
CROSS APPLY STRING_SPLIT(t.Column_A, ';') a
INTERSECT
SELECT b.value
FROM MyTable t
CROSS APPLY STRING_SPLIT(t.Column_B, ';') b
) ab
I am looking for a better solution in terms of performance and elegance to compare the string aggregated lists for two columns across (many) rows.
Here is another possible solution: