How to calculate maximum 8-h moving (rolling) averages in R?

1.7k Views Asked by At

I am using R. I know calculating moving average is a topic with several answers in this site, but I have some problems that make my question unique.

I have a data frame including 8784 hourly concentrations (366 days * 24 hours) of an air pollutant (Ozone). This data frame includes some NaN values (missing values). The procedure contains following steps:

1- calculating 8-hour moving (rolling) averages of hourly concentrations: i.e. every 8 concentrations should be averaged in this way: average of 1 to 8, average of 2 to 9, average of 3 to 10, etc. This leads to obtaining 24 moving averages for each day (every 24 hours).

2- for each day, I want the maximum of 8-hour moving averages: i.e. among the 24 moving averages, the highest number should be selected. Finally, 366 moving average (366 days) will be selected.

3- A new data frame containing 366 moving averages should be created.

I know there are some packages (openair, zoo, TTR) that do something like this, but are there any ways to write the codes without any packages?

An Exmaple of my data 

     ColName
1    18.76 
2    12.92 
3    8.12 
4    NaN 
5    12.92 
6    3.77 
7    18.76 
8    9.52 
9    94.09 
10    18.76 
11    14.13 
12    8.12 
13    2.04 
14    12.92 
15    9.17 
.
.
.
8783    34.58
8784    64.23 

The name of main data frame is "Hourly". I tried these codes:

Hourly1 <- c(0, cumsum(ifelse(is.nan(Hourly), 0, Hourly))) 
rsum <- (Hourly1[(Hourly1+1):length(Hourly1)] - Hourly1[1:(length(Hourly1) - 8)]) / 8

But when I try the first line, the following error occurs:

Error in is.nan(Hourly) : default method not implemented for type 'list'

UPDATE: I used the following codes, but the maximum of 8-hour averages is not calculated right:

Hourly2<-as.numeric(Hourly$Average)

names(Hourly2) <- rep(seq.Date(as.Date("2017-01-01"), by=1, length.out=366), each=24)

x<-Hourly2
#use cumsum to get the moving average, keep NaNs
cx <- c(0, cumsum(ifelse(is.nan(x), 0, x))) + c(0,x)*0

n <- 8

rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n

res <- data.frame(mov_avg=rsum, days=names(rsum))


#select max from each day, ignoring NaN's
mx <- aggregate(mov_avg~days, data=res, max)

I compared the final results(366 maximum of 8-hour averages, each for 1 day of year) with a standard pre-approved dataset. In some days, the codes calculated averages correctly, but in some other days not! I did not get its logic.

You can find my raw dataset here!

UPDATE 2:

Here is a link to download the final results produced by different methods!

UPDATE3:

The difference between the results was due to the different methods for calculating moving averages. There are three methods for calculating moving averages: left, right, and center. The codes proposed by the guys here follow the "right" method.

2

There are 2 best solutions below

12
Esther On

Here's an example of how to do it with cumsum when you have missing values. I would be careful to consider how they're distributed in your data and how you want to deal with them though.

#create some sample data
set.seed(1)
x <- rnorm(24*366)
names(x) <- rep(seq.Date(as.Date("2017-01-01"), by=1, length.out=366), each=24)
x[sample(100, 1:length(x))] <- NaN #add some missing values

#use cumsum to get the moving average, keep NaNs
cx <- c(0, cumsum(ifelse(is.nan(x), 0, x))) + c(0,x)*0

n <- 8

rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n

res <- data.frame(mov_avg=rsum, days=names(rsum))

#select max from each day, ignoring NaN's
mx <- aggregate(mov_avg~days, data=res, max)

days   mov_avg
1 2017-01-01 0.6404849
2 2017-01-02 0.3456389
3 2017-01-03 0.5998888
4 2017-01-04 0.6635502
5 2017-01-05 0.7244289
6 2017-01-06 0.1715349
0
LightonGlass On

I've been working on exactly that, and found a solution that uses map2()

# create a day of ozone data  

o3day <- data.frame(o3hrly = runif(24, 0.04, 0.1))

# 8hr average function
avg_8hr <- function(.x, .y, o3) {
  # print(.x)
  # print(.y)
  # print(o3)
  o3 %>% slice(.x:.y) %>% summarize(o38hr = mean(o3hrly))
}

max(unlist(map2(.x = 1:17, .y = 8:24, .f = avg_8hr, o3 = o3day)))