SQL Query Group Like Values within the same row and concatenate and sort others

72 Views Asked by At

Using SQL Version 15.0.2000.5

I have a table with this data:

StudentScheduleId Monday Tuesday Wednesday Thursday Friday MondayStartTime MondayEndTime TuesdayStartTime TuesdayEndTime WednesdayStartTime WednesdayEndTime ThursdayStartTime ThursdayEndTime FridayStartTime FridayEndTime
15 1 1 1 1 NULL 9:00 11:00 11:00 12:30 9:00 11:00 11:00 12:30 NULL NULL
31 1 NULL NULL 1 0 2:00 3:15 NULL NULL NULL NULL 2:00 3:15 NULL NULL

I want to achieve this format:

StudentScheduleId Schedule StartTime EndTime
15 T/Th 11:00 12:30
15 M/W 9:00 11:00
31 M 9:00 11:00

I was able to do this using this query:

Select s.StudentScheduleId, 
    STRING_AGG(s.[Day], '/') AS Schedule, 
    s.StartTime, 
    s.EndTime
From
(
(Select s.StudentScheduleId, 'M' As [Day], s.MondayStartTime As StartTime, s.MondayEndTime As EndTime
From StudentSchedule s
Where s.Monday = 1)
UNION
(Select s.StudentScheduleId, 'T' As [Day], s.TuesDayStartTime As StartTime, s.TuesdayEndTime As EndTime
From StudentSchedule s
Where s.Tuesday = 1)
UNION
(Select s.StudentScheduleId, 'W' As [Day], s.WednesdayStartTime As StartTime, s.WednesdayEndTime As EndTime
From StudentSchedule s
Where s.Wednesday = 1)
UNION
(Select s.StudentScheduleId, 'Th' As [Day], s.ThursdayStartTime As StartTime, s.ThursdayEndTime As EndTime
From StudentSchedule s
Where s.Thursday = 1)
UNION
(Select s.StudentScheduleId, 'F' As [Day], s.FridayStartTime As StartTime, s.FridayEndTime As EndTime
From StudentSchedule s
Where s.Friday = 1)
) As s
Group By s.StudentScheduleId, s.StartTime, s.EndTime

However, I have one concern and also one issue with the results.

  1. Is there a better more proficient way of doing this considering performance or just being more succinct? I don't expect this table getting to millions records, but I could see it growing into 100's of thousands eventually.
  2. I tested other scenarios using more days of the week having the same start and end time and the schedule column result can come out in any order. (Example: F/M/T/Th/W). I would want this to come out in logical order M/T/W/Th/F. I know about the WITHIN GROUP sorting for STRING_AGG but there's nothing to sort on or I'd need to add a sort column, any ideas?

Thank You for any help!

4

There are 4 best solutions below

0
siggemannen On BEST ANSWER

A slightly shorter solution for you:

SELECT  *
INTO #data
FROM    (
    VALUES  (15, 1, 1, 1, 1, NULL, N'9:00', N'11:00', N'11:00', N'12:30', N'9:00', N'11:00', N'11:00', N'12:30', cast(NULL AS nvarchar(10)), cast(NULL AS nvarchar(10)))
    ,   (31, 1, NULL, NULL, 1, 0, N'2:00', N'3:15', NULL, NULL, NULL, NULL, N'2:00', N'3:15', NULL, NULL)
) t (StudentScheduleId,Monday,Tuesday,Wednesday,Thursday,Friday,MondayStartTime,MondayEndTime,TuesdayStartTime,TuesdayEndTime,WednesdayStartTime,WednesdayEndTime,ThursdayStartTime,ThursdayEndTime,FridayStartTime,FridayEndTime)

SELECT  StudentScheduleId, starttime, endtime, STRING_AGG(shortcode, '/') WITHIN GROUP (ORDER BY daynumber)
FROM    #data d
CROSS APPLY (
    VALUES  (1,'M',monday, mondaystarttime, mondayendtime)
    ,   (2,'T',tuesday, tuesdaystarttime, tuesdayendtime)
    ,   (3,'W',Wednesday, WednesdayStartTime, WednesdayEndTime)
    ,   (4,'Th',Thursday, ThursdayStartTime, ThursdayEndTime)
    ,   (5,'F',Friday, FridayStartTime, FridayEndTime)
    ) v (daynumber, shortcode, day, starttime, endtime)
WHERE   v.day = 1
GROUP BY starttime, endtime,StudentScheduleId
ORDER BY StudentScheduleId, MIN(daynumber)

You can unpivot the days and times into rows which simplifies the aggregation a lot.

0
cmartin On

I figured out how to sort the aggregated schedule using WITHIN GROUP and a case statement, which works for a known set of values. You can find more information here: STRING_AGG

Select s.StudentScheduleId, 
    STRING_AGG(s.[Day], '/') WITHIN GROUP (Order By CASE
                                                      WHEN [Day] = 'M' THEN 1
                                                      WHEN [Day] = 'T' THEN 2
                                                      WHEN [Day] = 'W' THEN 3
                                                      WHEN [Day] = 'Th' THEN 4
                                                      WHEN [Day] = 'F' THEN 5
                                                 END ASC) AS Schedule, 
    s.StartTime, 
    s.EndTime
From
(
(Select s.StudentScheduleId, 'M' As [Day], s.MondayStartTime As StartTime, s.MondayEndTime As EndTime
From StudentSchedule s
Where s.Monday = 1)
UNION
(Select s.StudentScheduleId, 'T' As [Day], s.TuesDayStartTime As StartTime, s.TuesdayEndTime As EndTime
From StudentSchedule s
Where s.Tuesday = 1)
UNION
(Select s.StudentScheduleId, 'W' As [Day], s.WednesdayStartTime As StartTime, s.WednesdayEndTime As EndTime
From StudentSchedule s
Where s.Wednesday = 1)
UNION
(Select s.StudentScheduleId, 'Th' As [Day], s.ThursdayStartTime As StartTime, s.ThursdayEndTime As EndTime
From StudentSchedule s
Where s.Thursday = 1)
UNION
(Select s.StudentScheduleId, 'F' As [Day], s.FridayStartTime As StartTime, s.FridayEndTime As EndTime
From StudentSchedule s
Where s.Friday = 1)
) As s
Group By s.StudentScheduleId, s.StartTime, s.EndTime
0
Søren Kongstad On

I'd do a join on a dataset with weekdays and sort order. That way you only parse you source table once, in stead of once per weekday.

I join on the day which will pivot the table, and add one row per day Then aggregate on schedule and start endtime, and aggregate day names ordered by sortcolumn:

The first CTE is simple your sample data, the CTE named cte, is the pivoted table, which is aggregated in the final step.

WITH tab
AS
(SELECT
        *
    FROM (VALUES
    (15, 1, 1, 1, 1, NULL, '9:00', '11:00', '11:00', '12:30', '9:00', '11:00', '11:00', '12:30', NULL, NULL)
    , (31, 1, NULL, NULL, 1, 0, '2:00', '3:15', 'NULL', 'NULL', 'NULL', 'NULL', '2:00', '3:15', NULL, NULL)
    ) a (StudentScheduleId, Monday, Tuesday, Wednesday, Thursday, Friday, MondayStartTime, MondayEndTime, TuesdayStartTime, TuesdayEndTime, WednesdayStartTime, WednesdayEndTime, ThursdayStartTime, ThursdayEndTime, FridayStartTime, FridayEndTime))
,cte AS(
SELECT
    a.StudentScheduleId
    ,b.DayNm
    ,b.DayNum
   ,CASE
        WHEN DayNm = 'M' THEN a.MondayStartTime
        WHEN DayNm = 'T' THEN a.TuesdayStartTime
        WHEN DayNm = 'W' THEN a.WednesdayStartTime
        WHEN DayNm = 'Th' THEN a.ThursdayStartTime
        WHEN DayNm = 'F' THEN a.FridayStartTime
    END Starttime
   ,CASE
        WHEN b.DayNm = 'M' THEN a.MondayEndTime
        WHEN b.DayNm = 'T' THEN a.TuesdayEndTime
        WHEN b.DayNm = 'W' THEN a.WednesdayEndTime
        WHEN b.DayNm = 'Th' THEN a.ThursdayEndTime
        WHEN b.DayNm = 'F' THEN a.FridayEndTime
    END EndTime
FROM tab a
inner JOIN (VALUES ('M',1),('T',2),('W',3),('Th',4),('F',5))b(DayNm,DayNum)
ON
(   b.DayNm = 'M' AND a.Monday = 1)
OR( b.DayNm = 'T' AND a.Tuesday = 1 )
OR( b.DayNm = 'W' AND a.Wednesday = 1)
OR( b.DayNm = 'Th' AND a.Thursday = 1)
OR( b.DayNm = 'F' AND a.Friday = 1 )
)
SELECT StudentScheduleId
,STRING_AGG(DayNm,',') WITHIN GROUP (ORDER BY daynum)
      ,Starttime
      ,EndTime 
      FROM cte
GROUP BY StudentScheduleId
    ,Starttime
      ,EndTime 
4
Charlieface On

You just need to unpivot using CROSS APPLY (VALUES, then group and aggregate it back up.

This will be more efficient than joins or unions, as it only uses a single scan of the base table.

SELECT
  s.StudentScheduleId,
  STRING_AGG(v.Day, '/') WITHIN GROUP (ORDER BY v.Ordering) AS Schedule,
  v.StartTime,
  v.EndTime
FROM StudentSchedule s
CROSS APPLY (VALUES
  ('M',  1, Monday,    MondayStartTime,    MondayEndTime   ),
  ('T',  2, Tuesday,   TuesdayStartTime,   TuesdayEndTime  ),
  ('W',  3, Wednesday, WednesdayStartTime, WednesdayEndTime),
  ('Th', 4, Thursday,  ThursdayStartTime,  ThursdayEndTime ),
  ('F',  5, Friday,    FridayStartTime,    FridayEndTime   )
) v(Day, Ordering, IsValid, StartTime, EndTime)
WHERE v.IsValid = 1
GROUP BY
  s.StudentScheduleId,
  v.StartTime,
  v.EndTime
ORDER BY
  s.StudentScheduleId,
  MIN(v.Ordering);

db<>fiddle

I hope I don't need to tell you that your data is denormalized: the data should have been stored in separate rows, not columns.