I am trying to loop the step function in R through several Excel sheets of countries (AUD, EUR and SGD). I have merged the sheets into a single dataframe and I am trying to use nest and then step to do stepwise regression for each country. However, I am unable to integrate the loop into the step function.
After looping the step function, I would like to output the coefficients data (including t-stats, p-value etc.) into separate Excel sheets labelled by each country.
Would someone be able to help? I have included my attempt at looping, the step code without loop, and the data in dput.
### Loop with error
dfraw %>% group_by(dfraw, 'Country') %>%
nest() %>%
mutate(both_model = map(data, ~lm(VarY ~ ., data = dfraw))) %>%
step(both_model, direction = "both") %>%
group_by(Country) %>%
summary(both_model)
### Functional Step code
both_model <- lm(VarY ~ ., data = dfSGD)
both_model <- step(both_model, direction = "both")
summary(both_model)
Out <- createWorkbook()
addWorksheet(Out, "SGD")
writeData(Out, sheet = "SGD", x = both_model$coefficients)
saveWorkbook(Out, "output.xlsx")
### dput(dfraw)
structure(list(Country = c("AUD", "AUD", "AUD", "AUD", "AUD",
"AUD", "AUD", "AUD", "AUD", "AUD", "EUR", "EUR", "EUR", "EUR",
"EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "SGD", "SGD", "SGD",
"SGD", "SGD", "SGD", "SGD", "SGD", "SGD", "SGD"), VarY = c(-0.0244845360824741,
-0.0624174372523117, 0.0452624163437831, 0.0530749789385003,
0.00624000000000002, -0.0295754491970107, 0.0327707684745209,
-0.0182452800253847, 0.025048480930834, 0.0425666088601608, 0.0141000829416644,
0.00154489276626668, 0.0621540695036749, 0.00905518537502115,
-0.0209109380291228, 0.0147859922178988, -0.0318677573278799,
-0.0293082203837354, -0.024208903799075, -0.0178405500836277,
-0.0188187608569775, -0.0269326121253097, 0.0527939257325898,
0.0383738835848475, -0.0163586791881248, 0.000606244316459614,
-0.0250029554320841, -0.0177659080352996, -0.00352907144923342,
0.0195835545331209), VarX1 = c(-0.000207994636117426, -0.14021214097687,
0.019987000188707, 0.0905973411305785, 0.0606271485403591, 0.0460107119360711,
0.0197995343444735, -0.0253560472301735, 0.045979341688096, 0.0432113473174549,
-0.000207994636117426, -0.14021214097687, 0.019987000188707,
0.0905973411305785, 0.0606271485403591, 0.0460107119360711, 0.0197995343444735,
-0.0253560472301735, 0.045979341688096, 0.0432113473174549, -0.000207994636117426,
-0.14021214097687, 0.019987000188707, 0.0905973411305785, 0.0606271485403591,
0.0460107119360711, 0.0197995343444735, -0.0253560472301735,
0.045979341688096, 0.0432113473174549), VarX2 = c(-0.04, 0.390000000000001,
-1.015, -0.149999999999999, 0.234999999999999, 0.0650000000000004,
0.0750000000000002, 0.22, -0.0449999999999999, 0.115, -0.115,
-0.241, -0.252, -0.102, -0.0249999999999999, -0.0840000000000001,
-0.47202, 0.28388, -0.21801, -0.2159, 0.43, 1.02, -0.919999999999999,
-0.65, 0.0700000000000003, -0.63, -0.02, 0, -0.44, -0.29), VarX3 = c(0.0199999999999996,
-0.734999999999999, 0.229, 0.220000000000001, 0.0739999999999998,
-0.0760000000000005, 0.00800000000000001, 0.151, -0.04, -0.0739999999999998,
0.146, -0.117999999999999, -0.234, -0.024, 0.266, 0.101999999999999,
0.31286, 0.25693, -0.0321999999999996, 0.236699999999999, -0.0819999999999999,
-0.778, -0.116, 1.075, -0.151, 0.073999999999999, 0.133000000000001,
0.545999999999999, 0.175000000000001, 0.346), VarX4 = c(-0.0602317110633257,
-0.0633604956143422, 0.0775846309806723, -0.0297417548946277,
-0.0720518811692337, -0.0339499828708462, -0.00436185680343282,
-0.0348429263427839, 0.0958382153499828, 0.0404280217883632,
-0.0602317110633257, -0.0633604956143422, 0.0775846309806723,
-0.0297417548946277, -0.0720518811692337, -0.0339499828708462,
-0.00436185680343282, -0.0348429263427839, 0.0958382153499828,
0.0404280217883632, -0.0602317110633257, -0.0633604956143422,
0.0775846309806723, -0.0297417548946277, -0.0720518811692337,
-0.0339499828708462, -0.00436185680343282, -0.0348429263427839,
0.0958382153499828, 0.0404280217883632)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -30L))
Just as you are wrapping the
lm()in amap()call, you need to wrap thestep()part inmap()too. Sincestep()expects the data used inlm()we need to make sure that we also pass the original data to eachstep()call. We can usemap2()to pass thelmobject and the data to thestep()calls. Here is what that can look like:summary()is not that useful for programmatically evaluating model results. You can usebroom::glance()andbroom::tidy()instead:NOTE: There are widely published concerns with using stepwise model selection.