Assigning full names to acronyms

90 Views Asked by At

Currently I have a data frame which has the acronyms of unique cancer types (hotspot_mockup), like so:

Cancer Gene
AASTR IDH1
ACRM NRAS

In another data frame, I these 184 unique acronyms and their corresponding full names (new_hotspot_cancers). This is in the form:

Acronym Full Name
AASTR Anaplastic Astrocytoma
ACRM Acral Melanoma

I want to replace the acronyms in the first data frame with the corresponding full-names in the second data frame (assuming of course, the acronym exists in the second data frame). Overall, I want the result to look like:

Cancer Gene
Anaplastic Astrocytoma IDH1
Acral Melanoma NRAS

I was thinking of some kind of "for" loop, but I know this is frowned upon in R. As always, any guidance would be greatly appreciated!

2

There are 2 best solutions below

1
Phil On BEST ANSWER

I was thinking of some kind of "for" loop, but I know this is frowned upon in R.

It's not that it's frowned upon, it's that those who have experience in other programming languages tend to use for loops in R when they are not needed - either because R vectorizes by default, or because there are functions like lapply() or map() from the purrr package that do the job of a for loop more efficiently.

In this case, you can just do a left_join(), from the dplyr package.

df1 <- data.frame(Cancer = c("AASTR", "ACRM"), Gene = c("IDH1", "NRAS"))
df2 <- data.frame(Acronym = c("AASTR", "ACRM"), Full_Name = c("Anaplastic Astrocytoma", "Acral Melanoma"))

dplyr::left_join(df1, df2, by = c("Cancer" = "Acronym"))

  Cancer Gene              Full_Name
1  AASTR IDH1 Anaplastic Astrocytoma
2   ACRM NRAS         Acral Melanoma
0
Ibetthat On

You can just do an right outer join with merge(). The function matches the column names automatically, so make sure 'Cancer' in df1 and 'Acronym' in df2 have the same name.

colnames(df2)[1] <- 'Cancer'
df.new <- merge(x = df, y = df2, by = "Cancer", all.y = TRUE)

This gives you a new dataframe with the acronyms, the full name and the genes which you can filter afterwards.