How to normalize a model.matrix?

2.7k Views Asked by At
# first, create your data.frame
mydf <- data.frame(a = c(1,2,3), b = c(1,2,3), c = c(1,2,3))

# then, create your model.matrix
mym <- model.matrix(as.formula("~ a + b + c"), mydf)

# how can I normalize the model.matrix?

Currently, I have to convert my model.matrix back into a data.frame in order to run my normalization function:

normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x))) }
m.norm <- as.data.frame(lapply(m, normalize))

Is there anyway to avoid this step by simply normalizing the model.matrix?

3

There are 3 best solutions below

7
On BEST ANSWER

You can normalize each column without converting to a data frame with the apply function:

apply(mym, 2, normalize)
#   (Intercept)   a   b   c
# 1         NaN 0.0 0.0 0.0
# 2         NaN 0.5 0.5 0.5
# 3         NaN 1.0 1.0 1.0

You probably actually want to leave the intercept untouched, with something like:

cbind(mym[,1,drop=FALSE], apply(mym[,-1], 2, normalize))
#   (Intercept)   a   b   c
# 1           1 0.0 0.0 0.0
# 2           1 0.5 0.5 0.5
# 3           1 1.0 1.0 1.0
0
On

If you want to "normalize" in one sense you can just use the scale function which centers and sets the std.dev to 1.

> scale( mym )
  (Intercept)  a  b  c
1         NaN -1 -1 -1
2         NaN  0  0  0
3         NaN  1  1  1
attr(,"assign")
[1] 0 1 2 3
attr(,"scaled:center")
(Intercept)           a           b           c 
          1           2           2           2 
attr(,"scaled:scale")
(Intercept)           a           b           c 
          0           1           1           1 
> mym
  (Intercept) a b c
1           1 1 1 1
2           1 2 2 2
3           1 3 3 3
attr(,"assign")
[1] 0 1 2 3

As you can see, it doesn't really make sense to "normalize" all of the model matrix when the "Intercept" term is present. So you could instead do this:

> mym[ , -1 ] <- scale( mym[,-1] )
> mym
  (Intercept)  a  b  c
1           1 -1 -1 -1
2           1  0  0  0
3           1  1  1  1
attr(,"assign")
[1] 0 1 2 3

This is actually the model matrix that would result if your default contrasts option was set to "contr.sum" and the columns were factor type. This only gets accepted as an internal-to-model.matrix operation if the variables to be "normalized" are factors:

> mym <- model.matrix(as.formula("~ a + b + c"), mydf, contrasts.arg=list(a="contr.sum"))
Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]) : 
  contrasts apply only to factors
> mydf <- data.frame(a = factor(c(1,2,3)), b = c(1,2,3), c = c(1,2,3))
> mym <- model.matrix(as.formula("~ a + b + c"), mydf, contrasts.arg=list(a="contr.sum"))
> mym
  (Intercept) a1 a2 b c
1           1  1  0 1 1
2           1  0  1 2 2
3           1 -1 -1 3 3
attr(,"assign")
[1] 0 1 1 2 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.sum"
0
On

Another option is to vectorize this using the very useful matrixStats package (though TBHapply is usually very efficient too on matrices and when applied on columns). This way you get to keep your original data structure too

library(matrixStats)
Max <- colMaxs(mym[, -1]) 
Min <- colMins(mym[, -1])
mym[, -1] <- (mym[, -1] - Min)/(Max - Min)
mym
#   (Intercept)   a   b   c
# 1           1 0.0 0.0 0.0
# 2           1 0.5 0.5 0.5
# 3           1 1.0 1.0 1.0
# attr(,"assign")
# [1] 0 1 2 3