I have data with variables industry_sales, year, and industry.
I want to conduct time series regressions for each industry in rolling windows of 5 years, with the dependent variable being industry_sales and the independent variable being year. For example, if data ranges from years 2000-2010, then the rolling windows for each regression and industry group would be 2000-2004, 2001-2005, and so on.
I had an attempt but the code fails:
library(tidyverse)
library(zoo)
# Group the data by industry
data_grouped <- data %>%
group_by(industry)
glimpse(data_grouped)
# Define a function to run a regression
regression_func <- function(x) {
lm(industry_sales ~ year, data = x)
}
# Perform rolling linear regression on each group, with a window of 5 years
results_list <- data_grouped %>%
do({
rollapply(data_grouped$year, width = 5, FUN = regression_func, by = 1)
})
dput() output of the data is as below, any help would be appreciated:
structure(list(year = c(2010, 2011, 2012, 2013, 2014, 2015, 2016,
2017, 2018, 2019, 2020, 2021, 2022, 2010, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2010, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2010,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021,
2022, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019,
2020, 2021, 2022, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018, 2019, 2020, 2021, 2022, 2010, 2011, 2012, 2013, 2014, 2015,
2016, 2017, 2018, 2019, 2020, 2021, 2022, 2010, 2011, 2012, 2013,
2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022), industry_sales = c(853.218,
1001.856, 987.741, 990.679, 1061.804, 985.904, 1073.43, 1305.779,
1314.722, 1188.122, 1327.245, 1711.143, 577.358, 25731.969, 26484.263,
23085.259, 24842.509, 30563.672, 34947.361, 40066.425, 38293.972,
40484.096, 37160.342, 29109.755, 35227.521, 16522.915, 1072310.255,
1314543.982, 1288932.701, 1245467.184, 1169743.543, 882325.562,
804108.99, 966832.252, 1099655.803, 999685.99, 725094.688, 797072.967,
59906.738, 131274.107, 167635.266, 170003.001, 161892.01, 149722.289,
108169.702, 92950.89, 120837.727, 132130.143, 121670.84, 118206.596,
147732.623, 542.17, 201842.321, 209721.793, 215722.823, 223158.977,
221191.213, 223076.061, 229199.539, 242995.929, 256145.995, 257282.085,
234455.443, 214560.191, 13964.787, 88190.421, 90191.422, 80848.451,
93999.434, 92325.688, 99138.466, 98682.173, 95106.591, 97622.097,
94254.621, 81145.338, 87421.329, 12116.901, 247825.263, 262355,
235423.37, 236697.987, 245287.766, 222221.931, 219657.82, 222990.939,
224672.689, 162633.722, 139853.429, 148534.588, 4463.853, 28177.129,
29298.618, 29251.077, 31696.968, 31636.033, 33064.57, 35745.408,
32034.079, 31978.646, 31661.88, 28572.857, 34190.976, 3400.497
), industry = c("Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Agriculture, Forestry and Fishing",
"Agriculture, Forestry and Fishing", "Construction", "Construction",
"Construction", "Construction", "Construction", "Construction",
"Construction", "Construction", "Construction", "Construction",
"Construction", "Construction", "Construction", "Manufacturing",
"Manufacturing", "Manufacturing", "Manufacturing", "Manufacturing",
"Manufacturing", "Manufacturing", "Manufacturing", "Manufacturing",
"Manufacturing", "Manufacturing", "Manufacturing", "Manufacturing",
"Mining", "Mining", "Mining", "Mining", "Mining", "Mining", "Mining",
"Mining", "Mining", "Mining", "Mining", "Mining", "Mining", "Retail Trade",
"Retail Trade", "Retail Trade", "Retail Trade", "Retail Trade",
"Retail Trade", "Retail Trade", "Retail Trade", "Retail Trade",
"Retail Trade", "Retail Trade", "Retail Trade", "Retail Trade",
"Services", "Services", "Services", "Services", "Services", "Services",
"Services", "Services", "Services", "Services", "Services", "Services",
"Services", "Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Transportation, Communications, Electric, Gas and Sanitary service",
"Wholesale Trade", "Wholesale Trade", "Wholesale Trade", "Wholesale Trade",
"Wholesale Trade", "Wholesale Trade", "Wholesale Trade", "Wholesale Trade",
"Wholesale Trade", "Wholesale Trade", "Wholesale Trade", "Wholesale Trade",
"Wholesale Trade")), row.names = c(NA, -104L), class = c("tbl_df",
"tbl", "data.frame"))
There are several problems:
1) We will assume that for each regression we want one row with the industry, last year of the window, intercept, slope and R^2. Modify c(...) to change which statistics you want. For example,
c(year = tail(yr, 1), unlist(broom::glance(fm)))if you wanted a large number of statistics.This groups by
industryand then usesgroup_modifyto create the rows for each industry.giving:
2) If you just wanted a single statistic -- the slope, say -- then it might be more convenient to have a 2d zoo series with one column per industry and one row per year. Note that the slope is shift invariant so we can just use 1:5 as the predictor in each case and we will get the same slope as if we had used year.
giving:
If you wanted this as a data frame use
fortify.zoo(out2)or in long formfortify.zoo(out2, melt = TRUE).