Set specific values of a group of variables to NA based on another group of variables

Question

Set specific values of a group of variables to NA based on another group of variables

47 Views Asked by ceruleanclouds At 21 July 2022 at 23:57

I could use some help with a tidyverse solution to this question.

I'm working with a large dataset that has 20+ binary cancer outcomes (cancer_{cancertype}), as well as corresponding ages ({cancertype}_age). Some individuals are missing cancer phenotype information - I would like to set the age variables for each cancer type to NA if the cancer phenotype is missing. I've been trying to implement mutate(across()), but am having some issues specifying the appropriate arguments.

# load tidyverse lib
library(tidyverse)

# Set seed for reproducibility
set.seed(42)

# generate dataframe
cancer_ds <- data.frame(id = 1000:1009,
           cancer_a = rep(0:1, length = 10), 
           cancer_b = c(rep(0, 3), NA, NA, 1, NA, rep(1, 3)), 
           cancer_c = c(rep(0:1, each = 2, len = 6), rep(NA, 4)), 
           a_age = sample(30:60, 10, FALSE), 
           b_age = sample(30:60, 10, FALSE), 
           c_age = sample(30:60, 10, FALSE)
           ) 

cancer_ds

cancer_list <- paste("cancer",letters[seq(1:3)], sep = "_" )

cancer_list

# attempted code
out_ds <- cancer_ds %>% 
          mutate(across(ends_with("age"), ~replace(is.na(cancer_list)))

# expected output dataset 
out_ds_exp <- cancer_ds %>% 
          mutate(b_age = ifelse(b_age %in% c("43", "49", "47"), NA, b_age), 
                 c_age = ifelse(c_age %in% c("49", "31", "37", "32"), NA, c_age))

out_ds_exp

Any help is appreciated! Thanks.

Original Q&A

There are 1 best solutions below

**Maurits Evers** · Accepted Answer · 2022-07-22T00:08:59.283000

Here is an option.

cancer_ds %>%
    rename_with(~ str_replace_all(.x, "([a-z])_([a-z]{2,})", "\\2_\\1")) %>%
    pivot_longer(-id, names_to = c(".value", "grp"), names_sep = "_") %>%
    mutate(age = if_else(is.na(cancer), NA_integer_, age)) %>%
    pivot_wider(names_from = grp, values_from = c(cancer, age))
## A tibble: 10 x 7
#      id cancer_a cancer_b cancer_c age_a age_b age_c
#   <int>    <dbl>    <dbl>    <dbl> <int> <int> <int>
# 1  1000        0        0        0    46    33    54
# 2  1001        1        0        0    34    54    56
# 3  1002        0        0        1    30    34    33
# 4  1003        1       NA        1    54    NA    34
# 5  1004        0       NA        0    39    NA    42
# 6  1005        1        1        0    33    55    57
# 7  1006        0       NA       NA    47    NA    NA
# 8  1007        1        1       NA    60    44    NA
# 9  1008        0        1       NA    44    32    NA
#10  1009        1        1       NA    36    38    NA

Explanation: We first fix the inconsistent column names using rename_with: you have both "<what>_<group>" (e.g. "cancer_a") and "<group>_<what>" (e.g. "a_age"); then it's a simple matter of reshaping multiple paired columns from wide to long. We can then replace age values with NAs if cancer is NA before reshaping back from long to wide.

Set specific values of a group of variables to NA based on another group of variables

There are 1 best solutions below

Related Questions in R

Related Questions in NA

Related Questions in MUTATED

Trending Questions

Popular # Hahtags

Popular Questions