I would like to run logistic regression in statsmodels using an l1 penalty (lasso) and class weights due to a class imbalance. There are several posts that explain how to either implement logistic regression with an l1 penalty (ex: here) or how to implement logistic regression with class weights (ex: [here] (How to use weights in a logistic regression)), but I can't figure out how to do both together.
Here is what I've done so far:
# imports
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
# generate train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 6,
shuffle=True, stratify=y)
# build an l1 penalized logit model
logit_model_l1 = sm.Logit(y_train, sm.add_constant(X_train))
result_l1 = logit_model_l1.fit_regularized(method='l1')
Results:
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 1177
Model: Logit Df Residuals: 1142
Method: MLE Df Model: 34
Date: Wed, 31 Jan 2024 Pseudo R-squ.: 0.7835
Time: 10:01:40 Log-Likelihood: -45.466
converged: True LL-Null: -209.96
Covariance Type: nonrobust LLR p-value: 5.523e-50
# build a class-weighted logit model
logit_model_weighted = sm.GLM(y_train, sm.add_constant(X_train), family = sm.families.Binomial(), freq_weights = np.asarray(y_train))
result_weighted = logit_model_weighted.fit()
# note that if I change ".fit()" to ".fit_regularized(method='l1')" in line above, I get an error, as the l1 method is not an accepted parameter.
Results:
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 1177
Model: GLM Df Residuals: 1091
Model Family: Binomial Df Model: 34
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -1.2001e-09
Date: Wed, 14 Feb 2024 Deviance: 2.4032e-09
Time: 11:30:47 Pearson chi2: 1.20e-09
No. Iterations: 26 Pseudo R-squ. (CS): -2.039e-12
Covariance Type: nonrobust
Does anybody how to build a model that incorporates l1 penalization and class weights in statsmodels?
Note that I have already accomplished this in scikit-learn, but I need the additional statistics that are available via statsmodels.