Problem in creating a model.matrix of quantitative predictors in R

282 Views Asked by At

I must do a Lasso regression with the package glmnetand I have problems to generate my x model.matrix My data.frame: 108 observations, Y response variable, 24 predictors, here is an overview:

  CONVENTIONAL_HUmin CONVENTIONAL_HUmean CONVENTIONAL_HUstd CONVENTIONAL_HUmax
1   37.9400539686119    63.4903779286635   11.7592095845857   85.2375439991287
2   23.8400539686119    80.5903779286635   15.0592095845857   125.837543999129
3   19.3035945249441    73.2764716205565   12.8816244173147    130.24141901586
  CONVENTIONAL_HUQ1 CONVENTIONAL_HUQ2 CONVENTIONAL_HUQ3      HISTO_Skewness   HISTO_Kurtosis
1  54.9938390994964  65.4873070322704  72.8863025473031  -0.203420585259268 2.25208159159488
2  70.8938390994964  80.3873070322704  91.4863025473031  -0.117420585259268 2.91208159159488
3  64.4689755423307  73.8666609177099  81.7351818199415 -0.0908104900456161  2.8751327713366
  HISTO_ExcessKurtosis HISTO_Entropy_log10 HISTO_Entropy_log2 HISTO_Energy...Uniformity.
1   -0.751917020142877   0.701345471328916   2.32782599847774          0.219781577333287
2  -0.0887170201428774   0.793345471328916   2.63782599847774          0.184781577333287
3   -0.127231561113029   0.738530858918985   2.45445652190669          0.206887426065656
          GLZLM_SZE        GLZLM_LZE          GLZLM_LGZE       GLZLM_HGZE          GLZLM_SZLGE
1 0.366581916604228 35.7249100350856 8.7285612359045e-05 11497.6407737833 3.22615226279017e-05
2 0.693581916604228 984.424910035086 8.5685612359045e-05 11697.6407737833 5.98615226279017e-05
3 0.622711792823853 1103.10288991619 8.5573088970709e-05 11571.7421733917 5.33303855950858e-05
       GLZLM_SZHGE         GLZLM_LZLGE      GLZLM_LZHGE       GLZLM_GLNU       GLZLM_ZLNU
1 4164.91570215061 0.00314512237564268 405585.990838764 2.66964898745512 2.47759091065361
2 8064.91570215061  0.0835651223756427 11581585.9908388 12.9796489874551 38.5375909106536
3 7295.45317481887  0.0949686480587339 12926109.9421091 15.0930512668698 37.6083347285291
           GLZLM_ZP Y
1 0.219643444043173 1
2 0.112643444043173 0
3 0.104031438564764 0 

My code for the model.matrix

x=model.matrix(Y~.,data=data.det)

It générale a very large model.matrix with 244728 elements ! It seems that it has duplicated a hundred times each predictor of the 24 ! Here's an overview of the data.matrix:

(Intercept) CONVENTIONAL_HUmin-10.5599460313881
    CONVENTIONAL_HUmin-117.359946031388 CONVENTIONAL_HUmin-13.0599460313881
    CONVENTIONAL_HUmin-154.359946031388 CONVENTIONAL_HUmin-17.6599460313881
    CONVENTIONAL_HUmin-18.3599460313881 CONVENTIONAL_HUmin-2.87994603138811
    CONVENTIONAL_HUmin-21.281710504529 CONVENTIONAL_HUmin-28.3599460313881
    CONVENTIONAL_HUmin-3.44994603138811 CONVENTIONAL_HUmin-3.89640547505594
    CONVENTIONAL_HUmin-67.0599460313881 CONVENTIONAL_HUmin-682.359946031388
    CONVENTIONAL_HUmin-9.08171050452898 CONVENTIONAL_HUmin1.04428949547101
    CONVENTIONAL_HUmin1.63928949547101 CONVENTIONAL_HUmin10.8400539686119
    CONVENTIONAL_HUmin10.968289495471 CONVENTIONAL_HUmin11.5400539686119
    CONVENTIONAL_HUmin11.618289495471 CONVENTIONAL_HUmin11.6400539686119
    CONVENTIONAL_HUmin12.518289495471 CONVENTIONAL_HUmin12.5400539686119
    CONVENTIONAL_HUmin13.4400539686119 CONVENTIONAL_HUmin13.6400539686119
    CONVENTIONAL_HUmin13.7400539686119 CONVENTIONAL_HUmin13.818289495471
    CONVENTIONAL_HUmin14.5400539686119 CONVENTIONAL_HUmin14.6693017607572
    CONVENTIONAL_HUmin14.8400539686119 CONVENTIONAL_HUmin16.9400539686119
    CONVENTIONAL_HUmin17.0400539686119 CONVENTIONAL_HUmin17.618289495471
    CONVENTIONAL_HUmin18.2400539686119 CONVENTIONAL_HUmin18.8400539686119
    CONVENTIONAL_HUmin19.3035945249441 CONVENTIONAL_HUmin20.0400539686119
    CONVENTIONAL_HUmin20.818289495471 CONVENTIONAL_HUmin21.0400539686119
    CONVENTIONAL_HUmin21.118289495471 CONVENTIONAL_HUmin21.3400539686119
    CONVENTIONAL_HUmin21.5400539686119 CONVENTIONAL_HUmin21.9400539686119
...
attr(,"contrasts")$CONVENTIONAL_HUmin
[1] "contr.treatment"

Not convenient at all because I end up with much more predictors in the input x for Lasso Regression which makes hazardous selection of the predictors even more present

Have you an idea of the source of the dysfunction ? any suggestion to fix that ?

1

There are 1 best solutions below

4
On

Try this, you want a matrix not a model matrix...

# make a matrix of your predictors minus your outcome
x <- as.matrix(data.detect[-25])

# put the y column in a vector
y <- data.detect$Y

# run it
fit.lasso <- glmnet(x, y, family = "binomial", alpha = 1)