How to load data only once for multiple glm calls with varying formulas?

Question

How to load data only once for multiple glm calls with varying formulas?

686 Views Asked by rishiag At 15 November 2025 at 23:48

I have a dataset with 1 column for dependent variable and 9 for independent variables. I have to fit logit models in R taking all combinations of the independent variables.

I have created formulae for the same to be used in "glm" function. However, every time I call "glm" function, it loads the data (which is same every time as only the formula changes in each iteration).

Is there a way to avoid this so as to speed up my computation? Can I use a vector of formulae in "glm" function and load data only once?

Code:

tempCoeffV <- lapply(formuleVector, function(s) {   coef(glm(s,data=myData,family=binomial, y=FALSE, model=FALSE))})


formuleVector is a vector of strings like: 
myData[,1]~myData[,2]+myData[,3]+myData[,5]
myData[,1]~myData[,2]+myData[,6]

myData is data.frame

In each lapply statement, myData remains the same. It is a data.frame with around 1,00,000 records. formuleVector is a vector with 511 different formulas. Is there a way to speed up this computation?

Original Q&A

There are 1 best solutions below

**Zheyuan Li** · Accepted Answer

Great, you don't have factors; othersie I have to call model.matrix then play with $assign field, rather than simply using data.matrix.

## Assuming `mydata[, 1]` is your response

## complete model matrix and model response
X <- data.matrix(mydata); y <- X[, 1]; X[, 1] <- 1

## covariates names and response name
vars <- names(mydata)

This is how you get your 511 candidates, right?

choose(9, 1:9)
# [1]   9  36  84 126 126  84  36   9   1

Now instead of the number of combinations, we need a combination index, easy to get from combn. The rest of the story is to write a loop nest and loop through all combinations. glm.fit is used, as you only care coefficients.

model matrix has been set up; we only dynamically select its columns;
loop nest is not terrible; glm.fit is much more costly than your for loop. For readability, don't recode them as lapply for example.

lst <- vector("list", 9)  ## a list to store all result
for ( k in 1:9 ) {
  ## combn index; each column is a combination
  ## plus 1 as an offset as there is an intercept in `X`
  I <- combn(9, k) + 1
  ## now loop through all combinations, calling `glm.fit`
  n <- choose(9, k)
  lstk <- vector("list", n)
  for ( j in seq.int(n) )
    ## current index
    ind <- I[, j]
    ## get regression coefficients
    b <- glm.fit(X[, c(1, ind)], y, family = binomial())$coefficients
    ## attach model formula as an attribute
    attr(b, "formula") <- reformulate(vars[ind], vars[1])
    ## store
    lstk[[j]] <- b
    }
  lst[[k]] <- lstk
  }

In the end, lst is a nested list. Use str(lst) to understand it.

How to load data only once for multiple glm calls with varying formulas?

There are 1 best solutions below

Related Questions in R

Related Questions in PERFORMANCE

Related Questions in REGRESSION

Related Questions in GLM

Trending Questions

Popular # Hahtags

Popular Questions