Extracting two elements using grepl()

45 Views Asked by At

I have a dataset called "data" that looks like this:

data

I am trying to create a new variable, called "Group" which codes elements in the "FileName" variable as the following:

  • anything with element HC will be labelled as "HC PBMC"
  • anything with elements SF and PBMC will be labelled as "AS PBMC"
  • anything with elements SF and SFMC will be labelled as "AS SFMC"

In order to do this, I wrote this code:

data$Group<- ifelse(grepl("HC",data$FileName),"HC",
                    ifelse(grepl("SF & PBMC",data$FileName),"AS PBMC",
                           "AS SFMC"))

However, anything with elements SF and PBMC did not code as "AS PBMC" correctly. Instead it just skipped that condition and labelled it as "AS SFMC". Please see below:

data

Any help would be most welcome!

2

There are 2 best solutions below

1
dufei On

First note that "&" does not have the meaning of a logical "and" inside a regex. You could certainly achieve what you want with some sophisticated regex, but wouldn't it be more transparent to extract the components you consider for naming the groups first and then assign the cases in a second step?

library(tidyverse)

df <- tibble(
  FileName = c("HC1788 PBMC", "SF71 PBMC", "SF70_2 SFMC")
)

df |> 
  # extract components
  mutate(
    A = str_extract(FileName, "^HC|^SF"),
    B = str_extract(FileName, "PBMC$|SFMC$")
  ) |> 
  # assign groups
  mutate(Group = case_when(
    A == "HC" ~ "HC PBMC",
    A == "SF" & B == "PBMC" ~ "AS PBMC",
    A == "SF" & B == "SFMC" ~ "AS SFMC"
  ))
#> # A tibble: 3 × 4
#>   FileName    A     B     Group  
#>   <chr>       <chr> <chr> <chr>  
#> 1 HC1788 PBMC HC    PBMC  HC PBMC
#> 2 SF71 PBMC   SF    PBMC  AS PBMC
#> 3 SF70_2 SFMC SF    SFMC  AS SFMC

Created on 2023-10-12 with reprex v2.0.2

2
Ludwig On

A simple solution based on what you tried would be:

data$Group <- ifelse(grepl("HC", data$FileName),
                     "HC",
                     ifelse(
                             grepl("SF", data$FileName) & grepl("PBMC", data$FileName),
                             "AS PBMC",
                             "AS SFMC"
                     ))