In R, I have a dataframe with the answers to a questionnaire, and due to technical reasons the data could only be provided as the answer labels, so not the answer codes.
The answers can be Never, Sometimes, Always which corresponds with the answer codes 0, 1, 3, and these codes can be used to calculate a total score.
I've isolated a test dataset (the full data has more columns) and the R syntax I've got so far.
df <- read.table(text = "
DateOfBirth,Sex,Question_01,Question_02,Question_03,Question_04
12-03-2001,M,Sometimes,Sometimes,Never,Never
21-08-1988,F,Always,Never,Always,Sometimes
30-11-1985,F,Always,Always,Sometimes,Never
01-04-2001,M,Sometimes,Sometimes,Never,Never
12-12-1986,M,Always,Sometimes,Always,Never
19-02-1982,F,Always,Always,Never,Sometimes
11-05-1980,M,Sometimes,Sometimes,Never,Never
01-06-2000,F,Always,Sometimes,Always,Always
20-11-1981,F,Never,Never,Never,Sometimes
30-07-1982,M,Never,Never,Sometimes,Never
", header = TRUE, sep = ",", na.strings = "")
# Levels F, M (alphabetically sorted -> 1, 2)
df$Sex_fact <- factor(df$Sex)
lookup_freq <- c(
"0" = "Never",
"1" = "Sometimes",
"3" = "Always"
)
df$Q1 <- factor(df$Question_01, levels = lookup_freq, labels = names(lookup_freq))
df$Q2 <- factor(df$Question_02, levels = lookup_freq, labels = names(lookup_freq))
df$Q3 <- factor(df$Question_03, levels = lookup_freq, labels = names(lookup_freq))
df$Q4 <- factor(df$Question_04, levels = lookup_freq, labels = names(lookup_freq))
df$Total_score <- as.numeric(df$Q1) + as.numeric(df$Q2) + as.numeric(df$Q3) + as.numeric(df$Q4)
print(df)
The Total_score should just be all the answers added up, according to the 0, 1, 3 codes. The result I get is incorrect, and I understand that this is because it uses the numeric values of the factor, which is just 1, 2, 3. But I don't know how to set the levels to the desired codes, using the list I've got in lookup_freq.
DateOfBirth Sex Question_01 Question_02 Question_03 Question_04 Sex_fact Q1 Q2 Q3 Q4 Total_score
1 12-03-2001 M Sometimes Sometimes Never Never M 1 1 0 0 6
2 21-08-1988 F Always Never Always Sometimes F 3 0 3 1 9
3 30-11-1985 F Always Always Sometimes Never F 3 3 1 0 9
4 01-04-2001 M Sometimes Sometimes Never Never M 1 1 0 0 6
5 12-12-1986 M Always Sometimes Always Never M 3 1 3 0 9
6 19-02-1982 F Always Always Never Sometimes F 3 3 0 1 9
7 11-05-1980 M Sometimes Sometimes Never Never M 1 1 0 0 6
8 01-06-2000 F Always Sometimes Always Always F 3 1 3 3 11
9 20-11-1981 F Never Never Never Sometimes F 0 0 0 1 5
10 30-07-1982 M Never Never Sometimes Never M 0 0 1 0 5
So my question is, is it possible to create a Factor column with the labels Never, Sometimes, Always but with the underlying levels as 0, 1, 3 ? Or is it better to just create new columns and replace the character label values with the integer score values?
Or what is the best way to use a lookup variable, similar to lookup_freq? Because I've got more questionnaires with similarly coded answer columns, sometimes with 4 or 5 answer possibilities.
Option 1: extra level
A hack, but you can create factors such that they resolve to 1, 2, and 4 (instead of 0, 1, and 3) and subtract 1 for each column you're reference.
Option 2: lookup dictionary