I could use some help with a tidyverse solution to this question.
I'm working with a large dataset that has 20+ binary cancer outcomes (cancer_{cancertype}), as well as corresponding ages ({cancertype}_age). Some individuals are missing cancer phenotype information - I would like to set the age variables for each cancer type to NA if the cancer phenotype is missing. I've been trying to implement mutate(across()), but am having some issues specifying the appropriate arguments.
# load tidyverse lib
library(tidyverse)
# Set seed for reproducibility
set.seed(42)
# generate dataframe
cancer_ds <- data.frame(id = 1000:1009,
cancer_a = rep(0:1, length = 10),
cancer_b = c(rep(0, 3), NA, NA, 1, NA, rep(1, 3)),
cancer_c = c(rep(0:1, each = 2, len = 6), rep(NA, 4)),
a_age = sample(30:60, 10, FALSE),
b_age = sample(30:60, 10, FALSE),
c_age = sample(30:60, 10, FALSE)
)
cancer_ds
cancer_list <- paste("cancer",letters[seq(1:3)], sep = "_" )
cancer_list
# attempted code
out_ds <- cancer_ds %>%
mutate(across(ends_with("age"), ~replace(is.na(cancer_list)))
# expected output dataset
out_ds_exp <- cancer_ds %>%
mutate(b_age = ifelse(b_age %in% c("43", "49", "47"), NA, b_age),
c_age = ifelse(c_age %in% c("49", "31", "37", "32"), NA, c_age))
out_ds_exp
Any help is appreciated! Thanks.
Here is an option.
Explanation: We first fix the inconsistent column names using
rename_with: you have both"<what>_<group>"(e.g. "cancer_a") and"<group>_<what>"(e.g. "a_age"); then it's a simple matter of reshaping multiple paired columns from wide to long. We can then replaceagevalues withNAs ifcancerisNAbefore reshaping back from long to wide.