I have following input data:
head(data1)
VarA VarB VarC VarD VarE VarG VarH VarI
2016-06-01 09:30:05 14.2 31228 ABCD IS Equity 1 139 192 23
2016-06-01 09:30:07 14.2 31128 ABCD IS Equity 0 0 0 0
2016-06-01 09:30:09 14.2 36128 ABCD IS Equity 1 138 192 23
2016-06-01 09:30:19 14.2 36028 ABCD IS Equity 0 0 0 0
2016-06-01 09:30:21 14.2 27028 ABCD IS Equity 1 112 190 23
2016-06-01 09:30:37 14.2 26528 ABCD IS Equity 0 0 0 0
VarA is of type POSIXct, VarD is of type chr and rests are of type num.
VarE is my dependent variable. VarC, VarB, VarG, VarH and VarI are my explanatory variables. Total row counts of the datset is 7.4 million. I want to run logistic regression. I tried bigglm from biglm package using binomial family. But it is failing to converge. Due to which I am not getting proper deviance values. So I am having problem computing McFadden's R-Sqr value for the same. Can you please suggest any alternate package/way?
Thanks in advance.
The
sgdpackage will allow you to process the data sample-by-sample through the stochastic gradient descent method.