Difficulty converting factors to numeric in R

179 Views Asked by At

I have imported a .sav file using the haven package and wanted to keep the value labels for a number of factors (as you can do this when bringing in SPSS data but not with CSV data). This worked fine with no issues.

One of my multi-response variables (profession) is also a factor, however I want to make this numeric as I want to create a new variable that sums up the number of professions an individual has (there are 29 different options available, thus 29 separate variables for profession). However, when I try to change this back for one of the professions listed using as.numeric(), it doesn't seem to work (the variable is still a factor).

What can I do to convert this variable so I can treat it as numeric? (i.e consisting of 0s for those who did not select the profession, and 1s for those who did).

Below is the code I used.

Code and output of what I tried in RStudio

1

There are 1 best solutions below

0
monte On

Well, the problem you are trying to solve i.e. to get the count of professions one person is associated with, for that it is not necessary to convert a factor column to numeric. You can find that count with factor as well. Let me explain that with an example:

---
author: r197588
date: 2023-06-27
output:
  reprex::reprex_document:
    advertise: false
title: piny-loon_reprex.R
---

``` r
library(dplyr)
library(magrittr)

df = as.data.frame(datasets::Titanic)

# let's find total number of male passengers who travel first class

str(df)
#> 'data.frame':    32 obs. of  5 variables:
#>  $ Class   : Factor w/ 4 levels "1st","2nd","3rd",..: 1 2 3 4 1 2 3 4 1 2 ...
#>  $ Sex     : Factor w/ 2 levels "Male","Female": 1 1 1 1 2 2 2 2 1 1 ...
#>  $ Age     : Factor w/ 2 levels "Child","Adult": 1 1 1 1 1 1 1 1 2 2 ...
#>  $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ Freq    : num  0 0 35 0 0 0 17 0 118 154 ...


# as we can see class and sex both are factors
# to get the count of male travellers per class

df %>% group_by(Sex, Class) %>% summarize(count=n())
#> `summarise()` has grouped output by 'Sex'. You can override using the `.groups` argument.
#> # A tibble: 8 x 3
#> # Groups:   Sex [2]
#>   Sex    Class count
#>   <fct>  <fct> <int>
#> 1 Male   1st       4
#> 2 Male   2nd       4
#> 3 Male   3rd       4
#> 4 Male   Crew      4
#> 5 Female 1st       4
#> 6 Female 2nd       4
#> 7 Female 3rd       4
#> 8 Female Crew      4

Regarding your second question, why the column is still factor, that may be because you are not overriding that column in dataframe i.e. you may need to do:

df$your_column = as.numeric(df$your_column)

kindly upvote the answer, if you found this helpful.