I'm looking for a (T-)SQL solution to the following question. I have a table with columns
- Key (int, not null),
- Column1 (varchar(30), null), Column2, Column3 (up to Column9 but for brevity I'll stop at 3),
- SortOrder (int, not null)
Values in the Key column may be present in multiple rows. As a result set I need a row for each Key with values in the Column1-Column3 taken from potentially different rows in the initial data, where ColumnX value is not null and ordered by SortOrder. Hopefully it's easier to understand this from the example:
| Key | Column1 | Column2 | Column3 | SortOrder |
|---|---|---|---|---|
| Key1 | C1 | NULL | NULL | 1 |
| Key1 | C10 | C11 | NULL | 2 |
| Key1 | C20 | C21 | C22 | 3 |
| Key2 | NULL | C30 | NULL | 1 |
| Key2 | C40 | C41 | NULL | 2 |
The result set in this case would be
| Key | Column1 | Column2 | Column3 |
|---|---|---|---|
| Key1 | C1 | C11 | C22 |
| Key2 | C40 | C30 | NULL |
There are 100K-200K rows in the table so I'm looking for a good performing solution. Ideally one which reads the table only once but I'm not sure it's even possible (I'll gladly take analytical functions if they help performance or readability in this case). Suggestions for indexing are also welcome though I assume it will be by (Key, SortOrder) anyway.
I have a query which returns the desired result but I'm not quite happy with readability and especially performance:
WITH Keys AS (SELECT Key FROM MyTable GROUP BY Key)
SELECT
Key,
(SELECT TOP(1) A.Column1 FROM MyTable A WHERE A.Column1 IS NOT NULL AND A.Key=Keys.Key ORDER BY SortOrder) AS Column1,
(SELECT TOP(1) A.Column2 FROM MyTable A WHERE A.Column2 IS NOT NULL AND A.Key=Keys.Key ORDER BY SortOrder) AS Column2,
(SELECT TOP(1) A.Column3 FROM MyTable A WHERE A.Column3 IS NOT NULL AND A.Key=Keys.Key ORDER BY SortOrder) AS Column3
FROM Keys
You can use the new IGNORE NULLS option when doing FIRST_VALUE to get that you're looking for.
More old-school alternative is to use
first_value(column1) OVER (partition BY [key] ORDER BY case when column1 is null then 1 else 0 end, sortorder)which manually puts the NULLs last.Finally, a trick i like is to use aggregate together with padding to generate a combined sort + value column, after which the sort part is removed to leave the Value. This method has the upside that it doesn't require any window functions and is just a simple aggregate.
Output: