Get sum for each 5 minute time interval

133 Views Asked by At

Problem Description

I have a table (#tmstmp) with 2 columns dt (DATETIME) and payload (INT). Eventually I want to sum payload for each 5 minute interval there is.

Code

Setup

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DROP TABLE IF EXISTS #tmstmp
                     , #numbers;
CREATE TABLE #tmstmp (
  dt DATETIME PRIMARY KEY
  , payload INT NOT NULL
);

CREATE TABLE #numbers (
  n INT PRIMARY KEY
);
WITH numbers (n) AS (
  SELECT 0 AS n
  UNION ALL
  SELECT n + 1 AS n
    FROM numbers
   WHERE n < 100
)
INSERT
  INTO #numbers
SELECT n
  FROM numbers;

WITH rnd (mins, secs) AS (
  SELECT n2.n AS mins
         , CAST(ABS(CHECKSUM(NEWID())) % 60 AS INT) AS mins
   FROM #numbers AS n1
        , #numbers as n2
  WHERE n1.n < 5
    AND n2.n < 15
), tmstmp (dt) AS (
  SELECT DATEADD(SECOND, secs, DATEADD(MINUTE, mins, @start)) AS dt
    FROM rnd
) 
INSERT  
  INTO #tmstmp
SELECT DISTINCT dt
       , -1 AS payload
  FROM tmstmp
 ORDER BY dt;

UPDATE #tmstmp
   SET payload = CAST(ABS(CHECKSUM(NEWID())) % 10 AS INT);
GO

Non overlapping timeslots are easy

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;

WITH agg (slot, sum_payload) AS (
  SELECT DATEDIFF(MINUTE, @start, dt) / @slotDuration AS slot
         , SUM(payload) AS sum_payload
    FROM #tmstmp
   GROUP BY DATEDIFF(MINUTE, @start, dt) / @slotDuration
)
SELECT DATEADD(MINUTE, slot * @slotDuration, @start) AS [from]
       , DATEADD(MINUTE, (slot + 1) * @slotDuration, @start) AS [to]
       , sum_payload
  FROM agg;
from to sum_payload
2024-01-01 12:00:00 2024-01-01 12:05:00 124
2024-01-01 12:05:00 2024-01-01 12:10:00 106
2024-01-01 12:10:00 2024-01-01 12:15:00 95

Ultimate Goal: get running timeslots

I want, however, to have an entry for each interval in the range, that is from 12:00-12:05, 12:01-12:06, 12:02-12:07 etc. until the last timeslot.

I can construct the limits in the whole range before and use that in a JOIN like this:

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
DECLARE @intervals INT = (SELECT DATEDIFF(MINUTE, @start, MAX(dt)) FROM #tmstmp);

WITH ranges ([from], [to], slot) AS (
  SELECT DATEADD(MINUTE, n, @start) AS [from]
         , DATEADD(MINUTE, n + @slotDuration, @start) AS [to]
         , n AS slot
    FROM #numbers
   WHERE n <= @intervals
), tm_mult (slot, [from], [to], dt, payload) AS (
  SELECT slot
         , [from]
         , [to]
         , dt
         , payload
    FROM #tmstmp
   INNER JOIN ranges
      ON [from] <= dt
     AND dt < [to]
)
SELECT MIN([from]) AS [from]
       , MAX([to]) AS [to]
       , SUM(payload) AS sum_payload
  FROM tm_mult
 GROUP BY slot
 ORDER BY slot;
from to sum_payload
2024-01-01 12:00:00 2024-01-01 12:05:00 124
2024-01-01 12:01:00 2024-01-01 12:06:00 120
2024-01-01 12:02:00 2024-01-01 12:07:00 125
... ... ...
2024-01-01 12:14:00 2024-01-01 12:19:00 19

While this works in this toy example I have hundreds of thousands of timestamps in my real data and worst of all I little impact on the indices. My gut feeling tells me that I will create quite some duplication with my inequality JOIN and I was wondering whether this is anyways the canonical way of doing it or whether there is a more SQL-onic way of doing it? (like the pythonistas like to call certain code pythonic if it uses the language inherent concepts rather than trying to solve it with general tools).

1

There are 1 best solutions below

2
Watson On BEST ANSWER

Window functions in sql (WINDOW - microsoft.com / OVER - microsoft.com) are a great asset to add to your SQL toolbelt. Also particularly canonical; windows have been in since SQL Server 2005.

Below is an example:

SELECT
    [From],
    DATEADD(MINUTE, 1, [To]) [To],
    payload
FROM (
    SELECT
        dt,
        MIN(dt) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) [From],
        dt [To],
        SUM(payload) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) payload
    FROM (
        SELECT
            DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0) dt,
            SUM(payload) payload
        FROM #tmstmp
        GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
    ) q
) q
WHERE DATEDIFF(MINUTE, [From], [To]) > 3

I'd like to draw attention to both the 4 PRECEDING and DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0). As the later practically floors the datetime to the minute, 2024-01-01 12:04:00.000 is inclusive up to 2024-01-01 12:04:59.999, but doesn't include 2024-01-01 12:05:00.000. Hopefully that's the functionality you are looking for.

Here is a fiddle