I have a dataset with 1 column for dependent variable and 9 for independent variables. I have to fit logit models in R taking all combinations of the independent variables.
I have created formulae for the same to be used in "glm" function. However, every time I call "glm" function, it loads the data (which is same every time as only the formula changes in each iteration).
Is there a way to avoid this so as to speed up my computation? Can I use a vector of formulae in "glm" function and load data only once?
Code:
tempCoeffV <- lapply(formuleVector, function(s) { coef(glm(s,data=myData,family=binomial, y=FALSE, model=FALSE))})
formuleVector is a vector of strings like:
myData[,1]~myData[,2]+myData[,3]+myData[,5]
myData[,1]~myData[,2]+myData[,6]
myData is data.frame
In each lapply statement, myData remains the same. It is a data.frame with around 1,00,000 records. formuleVector is a vector with 511 different formulas. Is there a way to speed up this computation?
Great, you don't have factors; othersie I have to call
model.matrixthen play with$assignfield, rather than simply usingdata.matrix.This is how you get your 511 candidates, right?
Now instead of the number of combinations, we need a combination index, easy to get from
combn. The rest of the story is to write a loop nest and loop through all combinations.glm.fitis used, as you only care coefficients.glm.fitis much more costly than yourforloop. For readability, don't recode them aslapplyfor example.In the end,
lstis a nested list. Usestr(lst)to understand it.