Large Scale Data Logistic Regression

161 Views Asked by At

I have following input data:

head(data1)
               VarA VarB   VarC           VarD VarE  VarG  VarH VarI
2016-06-01 09:30:05 14.2  31228 ABCD IS Equity    1   139   192   23
2016-06-01 09:30:07 14.2  31128 ABCD IS Equity    0     0     0    0
2016-06-01 09:30:09 14.2  36128 ABCD IS Equity    1   138   192   23
2016-06-01 09:30:19 14.2  36028 ABCD IS Equity    0     0     0    0
2016-06-01 09:30:21 14.2  27028 ABCD IS Equity    1   112   190   23
2016-06-01 09:30:37 14.2  26528 ABCD IS Equity    0     0     0    0

VarA is of type POSIXct, VarD is of type chr and rests are of type num.

VarE is my dependent variable. VarC, VarB, VarG, VarH and VarI are my explanatory variables. Total row counts of the datset is 7.4 million. I want to run logistic regression. I tried bigglm from biglm package using binomial family. But it is failing to converge. Due to which I am not getting proper deviance values. So I am having problem computing McFadden's R-Sqr value for the same. Can you please suggest any alternate package/way?

Thanks in advance.

1

There are 1 best solutions below

1
mpjdem On

The sgd package will allow you to process the data sample-by-sample through the stochastic gradient descent method.