How to use grepl() to select specific strings in a list of dataframes?

49 Views Asked by At

In a list of dataframes, I need to select the variables that are named "id" and those that include "Duo". So I will have two variables per datafrmae in the output.

data <- list(foo = structure(list(bodyPart = c("leg", "arm", "knee"), 
side = c("LEFT", "RIGHT", "LEFT"), device = c("LLI", "LSM", 
"GHT"), `Duo:length` = c(12, 476, 7), id = c("AA", "BB", "CC"), 
mwID = c("a12", "k87", "j98")), class = "data.frame", row.names = c(NA, 
-3L)), bar = structure(list(bodyPart = c("ankel", "ear", "knee"
), `Duo:side` = c("LEFT", "LEFT", "LEFT"), device = c("GOM", "LSM", 
"YYY"), id = c("ZZ", "DD", "FF"), tqID = c("kj8", "ll23", "sc26"
)), class = "data.frame", row.names = c(NA, -3L)))

Desired output:

output <- list(foo = structure(list(`Duo:length` = c(12, 476, 7), id = c("AA", "BB", "CC")), 
class = "data.frame", row.names = c(NA, -3L)), 
bar = structure(list(`Duo:side` = c("LEFT", "LEFT", "LEFT"), id = c("ZZ", "DD", "FF")), 
class = "data.frame", row.names = c(NA, -3L)))

Here is the code that yields only the id columns. I am not sure why it can't get the columns including Duo.

lapply(data_list, function(cr) cr %>% dplyr::select(id, where(~any(grepl("Duo", names(.))))))

I would really appreciate your advice.

2

There are 2 best solutions below

3
Ronak Shah On BEST ANSWER

Using dplyr syntax :

library(dplyr)

purrr::map(data, ~.x %>% select(id, contains("Duo")))

#$foo
#  id Duo:length
#1 AA         12
#2 BB        476
#3 CC          7

#$bar
#  id Duo:side
#1 ZZ     LEFT
#2 DD     LEFT
#3 FF     LEFT

Or using regular expressions.

purrr::map(data, ~.x %>% select(matches("^(id|Duo)")))
0
Friede On

You can do

> lapply(data, \(x) x[, grepl("^id$|^Duo", colnames(x))])
$foo
  Duo:length id
1         12 AA
2        476 BB
3          7 CC

$bar
  Duo:side id
1     LEFT ZZ
2     LEFT DD
3     LEFT FF