R Loop Over Datasetes and Store Model Coefficients

42 Views Asked by At

I have 3 data sets and wish to run the same linear model on all of them, store the coefficient and its upper and lower confidence limits.

set.seed(1)
    school1 = data.frame(student = sample(c(1:100), 100, r = T),
                         score = runif(100))
    school2 = data.frame(student = sample(c(1:100), 100, r = T),
                         score = runif(100))
    school3 = data.frame(student = sample(c(1:100), 100, r = T),
                         score = runif(100))
                         
    schools = list('school1', 'school2', 'school3')
    storage <- vector('list', length(schools))
    
    for(i in seq_along(schools)){
      tmpdat <- schools[[i]]
      tmp <- lm(score ~ x1, data = tmpdat)
      storage[[i]] <- summary(tmp)$coef[1]
    }

I wish to make WANT which stores all the information and also the name of dataset:

WANT = data.frame(data = c('school1', 'school2', 'school3'),
                  coef = c(0,0,0),
                  coefLL = c(0,0,0),
                  coefUL=c(0,0,0))

but I am struggling,, I loop over the datasets but do not know how to store all the information I need....Also I have this for like 1000 data sets so the most efficient way possible is the best thank you so much

1

There are 1 best solutions below

3
Ben Bolker On BEST ANSWER

There are a few odd things about your setup - you don't have a list of school data sets, you have a list of school names? By "the coefficient" do you mean you're only interested in the slope (throwing away the intercept?) Why do you have a predictor variable x1 in your model when it's not in your data ... ?

library(broom)
library(tidyverse)
schoolnames <- c('school1', 'school2', 'school3')
schools <- mget(schoolnames)
res <- vector(length = 3, mode = "list")
names(res) <- schoolnames
for(i in seq_along(schools)){
      tmp <- lm(score ~ student, data = schools[[i]])
      res[[i]] <- (tidy(tmp, conf.int = TRUE)
           |> filter(term == "student")
           |> select(estimate, conf.low, conf.high)
      )
    }
WANT <- bind_rows(res, .id = "school")

You could also use purrr::map() for this ...

If for some reason you wanted to do this in a lower-tech way, you could:

res <- data.frame(schools = schoolnames, est = rep(NA,3),
                  lwr = rep(NA,3), upr = rep(NA,3))
for(i in seq_along(schools)){
      tmp <- lm(score ~ student, data = schools[[i]])
      ## use element 2/row 2 to pick out the slope coefficient/CIs
      res[i,1] <- coef(tmp)[2]
      res[i,2] <- confint(tmp)[2,1]  ## lower CI in column 1
      res[i,3] <- confint(tmp)[2,2]
}