Penalty factors in glmnet with multimomial logit

180 Views Asked by At

I'm trying to fit an adaptive lasso for a multinomial logit regression with glmnet. My problem is the following: when I try to use the penalty matrix (a 2x3 matrix) penalty.factor in cv.glmnet I get the following error:

`Error in glmnet(x, y, weights = weights, offset = offset, lambda = lambda, : the length of penalty.factor does not match the number of variables``

The problem, however, is with the columns (categories) rather than the rows (variables) because if I use only one column (penalty[,1]) of the penalty matrix, it works.

Rep code here:

y <- matrix(round(runif(100,1,3),0))
x <- matrix(rnorm(200),,2)

# Generate Penalties based on ridge regression
 
set.seed(4342)
 
ridge.cv <- cv.glmnet(x,y,alpha=0, family= "multinomial", type.measure = "deviance", nfolds = 10)

best_ridge <- do.call(cbind, coef(ridge.cv, s = ridge.cv$lambda.min))
penalty  <- 1 / abs(as.matrix(best_ridge)[-1,])



# Cross-validation of Lambda

lasso.cv <- cv.glmnet(x,y,alpha=, family= "multinomial", type.measure = "deviance", 
    penalty.factor = penalty, nfolds = 10)

HOW COULD I USE THE FULL PENALTY MATRIX?? Thank you!

1

There are 1 best solutions below

0
IRTFM On

The best_ridge has three columns so you are giving a 6 element matrix as the penalty:

> str(penalty)
 num [1:2, 1:3] 2.15e+37 1.12e+38 2.03e+38 9.76e+37 2.41e+37 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "V1" "V2"
  ..$ : chr [1:3] "1" "1" "1"

The error message says that the length of the penalty does not match the number of variables (2). So I tried offering only one column of that matrix and .... no error.

> lasso.cv <- cv.glmnet(x,y,alpha=, family= "multinomial", type.measure = "deviance", 
+     penalty.factor = penalty[,1], nfolds = 10)
> lasso.cv

Call:  cv.glmnet(x = x, y = y, type.measure = "deviance", nfolds = 10,      alpha = , family = "multinomial", penalty.factor = penalty[,          1]) 

Measure: Multinomial Deviance 

    Lambda Index Measure      SE Nonzero
min 0.1584     1   2.228 0.05557       0
1se 0.1584     1   2.228 0.05557       0

As far as wanting to pass all three columns to cv.glmnet at once, I don't think it makes any sense. Look at the values. The second column when I did it (and it should have been the same for you since you used set.seed) had signs reversed for those two variables' penalty. There's no reason to think that a regression-type function should have been designed to handle parameters in a "vectorized" manner. It would be pretty easy to use lapply to pass the columns in one at a time. You will get a three element list with each element being a separate realization of a cv.glmnet cal with the various column values.