I am interested in factorizing a numeric column into say 3 factors. What I did is to subset the column into 3 range of intervals and then try to factorize the 3 intervals into a single column Z and finally, merge the new factor column Z into my original data frame but my idea is not working. Is there a smatter way to just factorize a numeric column into arbitrary number of factors so that the data frame will not be distorted?
set.seed(0)
df1 <- data.frame(Y =floor(runif(10, min=0, max=10)),
X =floor(runif(10, min=0, max=50)))
str(df1)
'data.frame': 10 obs. of 2 variables:
$ Y: num 8 2 3 5 9 2 8 9 6 6
$ X: num 3 10 8 34 19 38 24 35 49 19
# The intended three factor intervals: X=3, 4<=X<=30, X>30
df1$fac1 <- factor(df1$X == 3, label=c(0,1))
df1$fac2 <- factor(df1$X >= 4 & df1$X <= 30, label=c(0,1))
df1$fac3 <- factor(df1$X > 30, label=c(0,1))
head(df1)
str(df1)
df2 = cbind(df1$Y, df1$X1, df1$X2, df1$X3)
Warning messages:
1: In xtfrm.data.frame(x) : cannot xtfrm data frames
2: In xtfrm.data.frame(x) : cannot xtfrm data frames
3: In xtfrm.data.frame(x) : cannot xtfrm data frames
head(df2,3)
[,1] [,2] [,3] [,4]
[1,] 8 2 2 2
[2,] 2 1 1 1
[3,] 3 2 2 2
However, even if this works, I suspect this could distort the rows of my original df1. What I really want is to make X a one column factor with 3 levels using the given intervals.
You can use
factor().output