Grepl values between comma and append as new column in R

Question

Grepl values between comma and append as new column in R

49 Views Asked by Gabriel G. At 16 February 2024 at 19:36

I have the following dataframe:

df=read.table(text="A     
hwqnewqn,ENS.1kmsdf,jmewhqjwjenj
jjqweq3w,eqwejnqwe,ENS.gkhkgfdlsl.jkmwejre
ENSAAAAAAAAAAAA,bbbbbbbb,cccccccc", header=TRUE)

Let's say I need a new column B with only the string between commas that somehow match ENS

So end result should be:

df=read.table(text="B     
ENS.1kmsdf
ENS.gkhkgfdlsl.jkmwejre
ENSAAAAAAAAAAAA", header=TRUE)

Is there any approach for that?

Original Q&A

There are 3 best solutions below

**jpsmith** · Answer 1 · 2024-02-16T19:43:16.980000

In base R, you could use vapply with strsplit to test for "ENS" using grepl:

df$B <- vapply(strsplit(df$A, ","), \(x) x[grepl("ENS", x)], as.character(1L))

#                                            A                       B
# 1           hwqnewqn,ENS.1kmsdf,jmewhqjwjenj              ENS.1kmsdf
# 2 jjqweq3w,eqwejnqwe,ENS.gkhkgfdlsl.jkmwejre ENS.gkhkgfdlsl.jkmwejre
# 3          ENSAAAAAAAAAAAA,bbbbbbbb,cccccccc         ENSAAAAAAAAAAAA

**B. Christian Kamgang** · Answer 2 · 2024-02-16T19:53:39.797000

You can use gsub:

df$B <- gsub(".*(ENS[^,]*).*", "\\1", df$A)

#                                            A                       B
# 1           hwqnewqn,ENS.1kmsdf,jmewhqjwjenj              ENS.1kmsdf
# 2 jjqweq3w,eqwejnqwe,ENS.gkhkgfdlsl.jkmwejre ENS.gkhkgfdlsl.jkmwejre
# 3          ENSAAAAAAAAAAAA,bbbbbbbb,cccccccc         ENSAAAAAAAAAAAA

**r2evans** · Answer 3 · 2024-02-16T20:43:51.423000

Your data looks like CSV, so assuming you are reading it in from a file, you can skip the first line (with the lone A) and grab the rest:

read.table(text="A     
hwqnewqn,ENS.1kmsdf,jmewhqjwjenj
jjqweq3w,eqwejnqwe,ENS.gkhkgfdlsl.jkmwejre
ENSAAAAAAAAAAAA,bbbbbbbb,cccccccc", header=FALSE, sep=",", skip=1)[,2,drop=FALSE] |>
  setNames("B")
#            B
# 1 ENS.1kmsdf
# 2  eqwejnqwe
# 3   bbbbbbbb

Or, if you've already read it in, you can use use read.csv to parse the remaining text:

read.csv(text = paste(df$A, collapse="\n"), header = FALSE)[,2,drop=FALSE] |>
  setNames("B")
#            B
# 1 ENS.1kmsdf
# 2  eqwejnqwe
# 3   bbbbbbbb

One reason that it might be better to use read.csv or the like is if there are quoted fields where doing a simpler string-split or regex might not split the text correctly.

Grepl values between comma and append as new column in R

There are 3 best solutions below

Related Questions in R

Related Questions in GREPL

Trending Questions

Popular # Hahtags

Popular Questions