I have these tibbles:
library(tidyverse)
data <- tribble(
~code, ~v1, ~v2, ~v3,
'ENSG00000141510.14', 10, 20, 30,
'ENSG00000133703.10', 15, 90, 50,
'ENSG00000187848.11', 18, 17, 16,
'ENSG00000133703.10', 55, 47, 22
)
code_info <- tribble(
~code, ~name,
'ENSG00000141510', 'p53',
'ENSG00000133703', 'KRAS',
'ENSG00000187848', 'P2XR2'
)
I want to create a new column data$name by comparing the columns data$code and code_info$code. In this sense, the code_info tibble kinda works as a key:value pair for the strings in data$code.
Each row in data will have its name column equal to the name column in code_info whose string in code_info$code is a partial string of the one in data$code.
In this example, the expected result would be:
# A tibble: 4 x 5
code v1 v2 v3 name
<chr> <dbl> <dbl> <dbl> <chr>
1 ENSG00000141510.14 10 20 30 p53
2 ENSG00000133703.10 15 90 50 KRAS
3 ENSG00000187848.11 18 17 16 P2XR2
4 ENSG00000133703.10 55 47 22 KRAS
So far I've been doing this by manually hardcoding a bunch of sequential str_replace, in a mutate, like:
data %>%
mutate(
name = str_replace(code, "ENSG00000141510.14", "p53"),
name = str_replace(name, "ENSG00000133703.10", "KRAS"),
name = str_replace(name, "ENSG00000187848.11", "P2XR2")
)
# output:
# # A tibble: 4 x 5
# code v1 v2 v3 name
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 ENSG00000141510.14 10 20 30 p53
# 2 ENSG00000133703.10 15 90 50 KRAS
# 3 ENSG00000187848.11 18 17 16 P2XR2
# 4 ENSG00000133703.10 55 47 22 KRAS
Which does the trick but obviously isn't very scalable. Is there a way to incorporate the information in code_info, so that this is done automatically, without hard-coding the replacement values?
Try this sloution with
left_join()andseparate()in order to create a common variable between datasets:Output: