Standardized regression coefficients

25 Views Asked by At

https://www.murraylax.org/rtutorials/multregression_standardized.html#problems-comparing-variables-on-the-same-scale

In the provided article, the author proposes a method for assessing the relative importance of predictors in a regression model. One frequently employed technique involves standardizing the variables, achieved by subtracting the mean and dividing by the standard deviation. However, it's important to note that this approach is applicable solely to numeric predictors due to its mathematical nature. But what about non-numeric predictors, such as categorical ones? How can we standardize them?

model (regular regression); model2 (regression with standardized continuous variables)


    data(mtcars)
    
    mtcars$cyl <- as.factor(mtcars$cyl)
    mtcars$vs <- as.factor(mtcars$vs)
    mtcars$am <- as.factor(mtcars$am)
    mtcars$gear <- as.factor(mtcars$gear)
    mtcars$carb <- as.factor(mtcars$carb)
    
    model <- lm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb, mtcars)
    summary(model)

----
**Call:
lm(formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs + 
    am + gear + carb, data = mtcars)
Residuals:
    Min      1Q  Median      3Q     Max 
-3.5087 -1.3584 -0.0948  0.7745  4.6251 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 23.87913   20.06582   1.190   0.2525  
cyl6        -2.64870    3.04089  -0.871   0.3975  
cyl8        -0.33616    7.15954  -0.047   0.9632  
disp         0.03555    0.03190   1.114   0.2827  
hp          -0.07051    0.03943  -1.788   0.0939 .
drat         1.18283    2.48348   0.476   0.6407  
wt          -4.52978    2.53875  -1.784   0.0946 .
qsec         0.36784    0.93540   0.393   0.6997  
vs1          1.93085    2.87126   0.672   0.5115  
am1          1.21212    3.21355   0.377   0.7113  
gear4        1.11435    3.79952   0.293   0.7733  
gear5        2.52840    3.73636   0.677   0.5089  
carb2       -0.97935    2.31797  -0.423   0.6787  
carb3        2.99964    4.29355   0.699   0.4955  
carb4        1.09142    4.44962   0.245   0.8096  
carb6        4.47757    6.38406   0.701   0.4938  
carb8        7.25041    8.36057   0.867   0.3995  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.833 on 15 degrees of freedom
Multiple R-squared:  0.8931,    Adjusted R-squared:  0.779 
F-statistic:  7.83 on 16 and 15 DF,  p-value: 0.000124**
----

    model2 <- lm(mpg ~ cyl + scale(disp) + scale(hp) + scale(drat) + scale(wt) + scale(qsec) + vs + am + gear + carb, mtcars)
    summary(model2)

----
Call:
lm(formula = mpg ~ cyl + scale(disp) + scale(hp) + scale(drat) + 
    scale(wt) + scale(qsec) + vs + am + gear + carb, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5087 -1.3584 -0.0948  0.7745  4.6251 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  17.9842     5.3241   3.378  0.00414 **
cyl6         -2.6487     3.0409  -0.871  0.39747   
cyl8         -0.3362     7.1595  -0.047  0.96317   
scale(disp)   4.4056     3.9535   1.114  0.28267   
scale(hp)    -4.8342     2.7031  -1.788  0.09393 . 
scale(drat)   0.6324     1.3279   0.476  0.64074   
scale(wt)    -4.4322     2.4841  -1.784  0.09462 . 
scale(qsec)   0.6573     1.6715   0.393  0.69967   
vs1           1.9309     2.8713   0.672  0.51151   
am1           1.2121     3.2135   0.377  0.71132   
gear4         1.1144     3.7995   0.293  0.77332   
gear5         2.5284     3.7364   0.677  0.50890   
carb2        -0.9794     2.3180  -0.423  0.67865   
carb3         2.9996     4.2935   0.699  0.49547   
carb4         1.0914     4.4496   0.245  0.80956   
carb6         4.4776     6.3841   0.701  0.49381   
carb8         7.2504     8.3606   0.867  0.39948   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.833 on 15 degrees of freedom
Multiple R-squared:  0.8931,    Adjusted R-squared:  0.779 
F-statistic:  7.83 on 16 and 15 DF,  p-value: 0.000124
----

Based on model2 output, the order of importance for the continuous predictors is: disp > qsec > drat > wt > hp

Is this interpretation correct?

How do we assess the order of importance for the categorical predictors, i.e. cyl, vs, am, gear, carb?

It would help me immensely if you could provide, in addition to your explanation, an illustrative example using R programming.

0

There are 0 best solutions below