Conditional dplyr::summarize() of a data.frame in R

82 Views Asked by At

In my DATA below, I wonder how to summarize() the number of 6 different Ethnicities (Hispanic, AmIndian, Asian, White, Pacific, AsiaPacific) chosen ("Y") when Ethinc_overall!="B"?

library(tidyverse)

DATA <- read.table(h=TRUE,text=
"EL_Type Language Black Hispanic AmIndian Asian White Pacific AsiaPacific Ethinc_overall
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     Y     N       N           Y              M
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    Y     N     Y       N           N              H
 Current  English Black        Y    Y     N     N       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    Y     N     Y       N           N              H
 Current  English Black        Y    Y     N     Y       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     Y       N           N              M
 Current  English Black        N    Y     N     N       N           N              M
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     Y     N       N           Y              M
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    Y     N     N       N           N              H
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        Y    N     N     Y       N           N              H
 Current  English Black        Y    Y     N     N       Y           Y              H
 Current  English Black        Y    Y     N     N       Y           Y              H
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    Y     Y     Y       Y           Y              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    N     N     Y       N           N              H
 Current  English Black        Y    N     N     Y       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    N     N     Y       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    N     N     Y       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        N    N     N     N       N           N              B
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        Y    N     N     N       N           N              H
 Current  English Black        N    N     N     Y       N           N              M
 Current  English Black        N    N     N     N       N           N              B")
3

There are 3 best solutions below

0
M-- On BEST ANSWER

We can summarize across the wanted columns by group, and then convert the results from wide to long format:

DATA %>% 
  filter(Ethinc_overall != "B") %>% 
  summarise(across(Hispanic:AsiaPacific, 
                   list(Y = ~ sum(. == "Y"))), .by = Ethinc_overall) %>% 
  pivot_longer(-Ethinc_overall, values_to = "count") %>% 
  separate(name, into = c("ethnicity", "Y")) %>% 
  # filter(count > 0) %>% 
  select(-Y)

#> # A tibble: 12 x 3
#>    Ethinc_overall ethnicity   count
#>    <chr>          <chr>       <int>
#>  1 H              Hispanic       23
#>  2 H              AmIndian        8
#>  3 H              Asian           1
#>  4 H              White           9
#>  5 H              Pacific         3
#>  6 H              AsiaPacific     3
#>  7 M              Hispanic        0
#>  8 M              AmIndian        1
#>  9 M              Asian           2
#> 10 M              White           2
#> 11 M              Pacific         0
#> 12 M              AsiaPacific     2

or better, start with pivot_longer and then summarise:

DATA %>% 
  filter(Ethinc_overall != "B") %>% 
  pivot_longer(Hispanic:AsiaPacific, names_to = "ethnicity") %>% 
  summarise(count = sum(value =="Y"), .by = c(Ethinc_overall, ethnicity))

#> # A tibble: 12 x 3
#>    Ethinc_overall ethnicity   count
#>    <chr>          <chr>       <int>
#>  1 H              Hispanic       23
#>  2 H              AmIndian        8
#>  3 H              Asian           1
#>  4 H              White           9
#>  5 H              Pacific         3
#>  6 H              AsiaPacific     3
#>  7 M              Hispanic        0
#>  8 M              AmIndian        1
#>  9 M              Asian           2
#> 10 M              White           2
#> 11 M              Pacific         0
#> 12 M              AsiaPacific     2

Created on 2024-02-09 with reprex v2.0.2

1
Wojciech Artichowicz On

You can use the folowing code to count only occurrences of "Y":

DATA %>% filter(Ethinc_overall!="B") %>% 
    summarise_at(c('Hispanic', 'AmIndian', 'Asian', 'White', 'Pacific','AsiaPacific'),
                 function(x){sum(x=="Y")})

Explanation:

  • filter(Ethinc_overall!="B") - discards all rows for which value in column Ethinc_overall is equal to B
  • summarise_at(where,what) - performs some operation (what) in given columns (where)
  • function(x){sum(x=="Y")} - it is a function that checks if argument is equal to "Y" and sums all TRUE values in the column. You can pretty much change the content of the function as you like.

The outcome is:

Hispanic    AmIndian    Asian   White   Pacific AsiaPacific
  <int>       <int>     <int>   <int>     <int>     <int>
    23          9         3       11        3          5
0
xx0020 On

I am not sure if I understand your question correctly, but maybe this does what you're trying to do (you can replace summarise by mutate depending on the resulting object you prefer):

DATA |> 
  dplyr::summarise(
    across(
      .cols = c(Hispanic, AmIndian, Asian, White, Pacific, AsiaPacific),
      .fns = list(
        counted =  ~length(which(. == "Y" & Ethinc_overall != "B"))
      ),
      .names = "{.col}_{.fn}"
    ))

## this gives:
##  Hispanic_counted AmIndian_counted Asian_counted White_counted Pacific_counted AsiaPacific_counted
##1               23                9             3            11               3                   5