Rolling data using Average

74 Views Asked by At

Is postgresql can do average for rolling data? I have this case where I need to average the values for 5 days and get all the days filled up while waiting for the actual values so basically if I'm in day 6 today, my formula would be day 1-5 then day 7 would be 2-6 but day 6 values comes from the average today and so on. You can see the values below. I have been searching if sql can do this but can't find any function. I only found recursive but not available in postgres.

SAMPLE TABLE

enter image description here

2

There are 2 best solutions below

1
JohnH On

The key to calculating rolling means with missing value substitution is to maintain a buffer holding the values contributing to the mean. The following query demonstrates calculating a rolling mean using a recursive CTE with an array holding the contributing values:

WITH RECURSIVE
  -- generate sample data as per original post
  t AS (
    SELECT s.n AS day_number, (CASE WHEN s.n <= 5 THEN s.n + 1 END)::float AS day_value
      FROM generate_series(1, 9) s(n)),
  cte AS (
    SELECT t.day_number,
           t.day_value AS original_day_value,
           t.day_value,
           COALESCE(t.day_value, 0) AS rolling_sum,
           CASE WHEN t.day_value IS NULL THEN 0 ELSE 1 END AS sample_count,
           -- array size determines maximum number of samples contributing to rolling mean
           array_fill(NULL::float, ARRAY[4]) || ARRAY[t.day_value] AS samples
      FROM t
      WHERE t.day_number = 1
    UNION ALL
    SELECT t.day_number,
           t.day_value AS original_day_value,
           w.day_value,
           -- maintain rolling sum to eliminate need to repeatedly sum samples
           cte.rolling_sum - COALESCE(samples[cte.sample_count - cardinality(cte.samples) + 1], 0) + COALESCE(w.day_value, 0) AS rolling_sum,
           -- because missing values have substitutions, sample_count can never decrease
           LEAST(cardinality(cte.samples),
                 (cte.sample_count + CASE WHEN w.day_value IS NOT NULL THEN 1 ELSE 0 END)) AS sample_count,
           -- shift out the oldest sample and append the newest
           cte.samples[2:] || w.day_value
      FROM t
      JOIN cte
        ON t.day_number = cte.day_number + 1
      CROSS JOIN LATERAL (
        -- compute the new day value once instead of repeating the calculation in the enclosing query's select list
        SELECT COALESCE(t.day_value, cte.rolling_sum / NULLIF(cte.sample_count, 0)) AS day_value) w)
SELECT cte.*
  FROM cte
  ORDER BY cte.day_number;

I don't have access to a postgreSQL 9.1 database, so I have only run this with PostgreSQL 15.3. There might be some changes needed for the older version; however, the general concepts still apply.

0
Belayer On

You can get what the 5-day running average you asked for with the window version of the avg() function, see here and here with demo here.

select day
     , value
     , case when rn>5
            then p5::numeric(5,2)
            else null
       end "prior 5 day average"
  from (select * 
             , avg(value) over win p5
             , row_number() over() rn
          from sample_table
        window win as (order by day 
                       range between 5 preceding
                                 and 1 preceding
                      )   
       ) sq;

Well at least you get what you asked for, 5 day prior average. You cannot however get what you have in your sample table as days 6 thru 10 do not have a valid value and I am not going to what values you lead to your posed results.

NOTE: Demo actually run with v15. As v9.1 is no longer carried by db<>fiddle, but documentation references are v9.1.