how to intersect elements of one column along with group_by in R

743 Views Asked by AnilGoyal At 05 March 2021 at 07:45

Lets assume, my data is like

  group_id    col1
1        1     A,B
2        1     B,C
3        2     A,C
4        2     B,D
5        3     A,D
6        3 A,B,C,D

I would like to summarise/mutate col1 where its elements are intersected within same group (over group_id). My required output is like (if summarising)

  group_id col1
1        1    B
2        2 <NA>
3        3  A,D

or like this (if mutating)

  group_id col1
1        1    B
2        1    B
3        2 <NA>
4        2 <NA>
5        3  A,D
6        3  A,D

I can easily create a union by using function toString but scratching my head for how to have common elements in the output. Basically intersect requires at least two arguments and is therefore not working here.

dput(df) is as under

df <-  structure(list(group_id = c(1L, 1L, 2L, 2L, 3L, 3L), col1 = c("A,B", 
"B,C", "A,C", "B,D", "A,D", "A,B,C,D")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

Original Q&A

There are 3 best solutions below

Ronak Shah On 05 March 2021 at 07:52 BEST ANSWER

You can split col1 on comma and use Reduce + intersect to get common values in each group_id.

library(dplyr)
df %>%
  group_by(group_id) %>%
  summarise(col1 = toString(Reduce(intersect, strsplit(col1, ','))))

#  group_id col1  
#*    <int> <chr> 
#1        1 "B"   
#2        2 ""    
#3        3 "A, D"

Karthik S On 05 March 2021 at 07:57

Would this work:

library(dplyr)
library(tidyr)
df %>% separate_rows(col1) %>% 
   group_by(group_id, col1) %>% filter(n()>1) %>% 
   distinct() %>% group_by(group_id) %>% summarise(col1 = toString(col1)) %>% 
   right_join(df %>% select(group_id) %>% distinct()) %>% 
   arrange(group_id)
`summarise()` ungrouping output (override with `.groups` argument)
Joining, by = "group_id"
# A tibble: 3 x 2
  group_id col1 
     <int> <chr>
1        1 B    
2        2 NA   
3        3 A, D

tmfmnk On 05 March 2021 at 07:58

One option using dplyr and tidyr could be:

df %>%
 separate_rows(col1) %>%
 count(group_id, col1) %>%
 group_by(group_id) %>%
 summarise(col1 = if_else(all(n == 1), NA_character_, paste(col1[n == 2], collapse = ",")))

  group_id col1 
     <int> <chr>
1        1 B    
2        2 <NA> 
3        3 A,D

how to intersect elements of one column along with group_by in R

There are 3 best solutions below

Related Questions in R

Related Questions in DPLYR

Related Questions in SET-INTERSECTION

Related Questions in SET-OPERATIONS

Trending Questions

Popular # Hahtags

Popular Questions