How to optimize the query so the manual work can be dyanmic and in optimized way

56 Views Asked by At

I have below table like: SQL fiddle

I am able to get this output via XML, but I am not sure how I can get below output properly for larger number of users (approx 0.2M users).

Later I want to get top-3 Names by their counts for each id ,so RANK or OrderBy clauses will come into SQL and not sure how many iteration will it take when data is of large number of users.

Working code that I have tried:

-----------SQL Raw Table Creation------------------------
CREATE TABLE tb
(
    Id INT,
    Name VARCHAR(50) NOT NULL
);

INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'bb');
INSERT INTO tb (Id, Name) VALUES (1,  'cc');
INSERT INTO tb (Id, Name) VALUES (1,  'cc');
INSERT INTO tb (Id, Name) VALUES (1,  'dd');
INSERT INTO tb (Id, Name) VALUES (1,  'dd');
INSERT INTO tb (Id, Name) VALUES (1,  'dd');

INSERT INTO tb (Id, Name) VALUES (2,  'aa');
INSERT INTO tb (Id, Name) VALUES (2,  'bb');
INSERT INTO tb (Id, Name) VALUES (2,  'bb');
INSERT INTO tb (Id, Name) VALUES (2,  'ee');
INSERT INTO tb (Id, Name) VALUES (2,  'ee');
INSERT INTO tb (Id, Name) VALUES (2,  'ee');
INSERT INTO tb (Id, Name) VALUES (2,  'ee');

INSERT INTO tb (Id, Name) VALUES (3,  'aa');
INSERT INTO tb (Id, Name) VALUES (3,  'bb');
INSERT INTO tb (Id, Name) VALUES (3,  'cc');
INSERT INTO tb (Id, Name) VALUES (3,  'dd');
INSERT INTO tb (Id, Name) VALUES (3,  'dd');
INSERT INTO tb (Id, Name) VALUES (3,  'dd');




-----------------Want to RANK or get only top 3 rows for each Id when group by Name--------------
select f.* into #t1
from(
    select f.* 
    from(
        select f.* 
        from (
        
        select top 3 id,name,count(name) as total 
        from tb 
        where id = 1
        group by id,name 
        order by id,total desc
        )f
        Union 
    
        select top 3 id,name,count(name) as total 
        from tb 
        where id = 2
        group by id,name 
        order by id,total desc
    )f
    Union 

    select top 3 id,name,count(name) as total 
    from tb 
    where id = 3
    group by id,name 
    order by id,total desc

    ) f

/* Output is moved in temp table #t1 which looks like 
id  name    total
1   aa  5
1   cc  2
1   dd  3
2   aa  1
2   bb  2
2   ee  4
3   bb  1
3   cc  1
3   dd  3 
 */

---------Final Joining for each Top3Names and RespectiveTotal -----

select a.id as ID, a.listStr as Top3Names , b.Total as RespectiveTotal
from
    (
    SELECT  id,STUFF((SELECT  ',' + name
                FROM #t1 EE
                WHERE  EE.id=E.id
           FOR XML PATH('')), 1, 1, '') AS listStr
    FROM  #t1 E
    GROUP BY E.id
    )a
    left Join
    (
    SELECT  id,STUFF((SELECT  ',' + cast(total as Varchar)
                FROM #t1 EE
                WHERE  EE.id=E.id
           FOR XML PATH('')), 1, 1, '') AS Total
    FROM  #t1 E
    GROUP BY E.id
    )b 
    on a.id=b.id

Output:

ID  Top3Names   RespectiveTotal
1   aa,cc,dd    5,2,3
2   aa,bb,ee    1,2,4
3   bb,cc,dd    1,1,3

Here I am using UNION for each ID, which is not correct way of doing. I want an optimized way. Also I am using a temp table to store my results. Is it a good way? Let me know for any correct solution or alternatives so that I can test it on larger.

1

There are 1 best solutions below

2
eshirvana On BEST ANSWER

on my SQL SERVER machine for given sample data, your query stats looks like this:

  • Total Logical Reads: 13
  • Total CPU Time: 00:00:00.007

if you are using SQL SERVER 2017+ you can use STRING_AGG function :

SELECT
    id
    , STRING_AGG(name,',') WITHIN GROUP (order by name asc) Top3Names  
    , STRING_AGG(countx,',')  WITHIN GROUP (order by name asc) RespectiveTotal
FROM (
    SELECT 
      id 
      ,  name 
      , count(*) countx 
      , ROW_NUMBER() over (partition by id order by count(*)  desc) rownumber
    FROM tb
    GROUP BY name, id
    ) result1
WHERE 
    result1.rownumber < 4
GROUP BY id 

Stats are like :

  • Total Logical Reads: 1
  • Total CPU Time: 00:00:00.000

for SQL SERVER 2016- :

select id
    ,  STUFF((
       SELECT ',' + t1.Name
         FROM cte t1
        WHERE t1.id = t2.id 
        and t1.rownumber <  4
        ORDER BY t1.name
          FOR XML PATH('')), 1, LEN(','), '') AS Top3Names

    , STUFF((
       SELECT ',' + cast(t1.countx as varchar(50))
         FROM cte t1
        WHERE t1.id = t2.id --and t1.name = t2.name 
        and t1.rownumber <  4
        ORDER BY t1.name
          FOR XML PATH('')), 1, LEN(','), '') AS RespectiveTotal
from cte t2
group by id

Stats are like :

  • Total Logical Reads: 7
  • Total CPU Time: 00:00:00.006

so regardless of sql server version, It will improve performance, you will get the best performance if you are using sql server 2017 or above using query above.