R - getting count of maximum-sized sub-group when summarising at prior group_by level

50 Views Asked by At

What I am trying to do is along the lines of the following:

library(tidyverse)

starwars %>% 
  filter(!is.na(gender)) %>% 
  group_by(gender) %>% 
  summarise(total_count = n(), max_species_count_per_gender = max(count(species)))

Basically, in addition to trying to get the total count per group separated by gender after one group_by and reporting that in a summary column, I am also trying to extract the highest subgroup population count of that higher-level group for a given trait (in this case, species). Obviously, the above does not work, returning the error message,

Caused by error in `UseMethod()`:
! no applicable method for 'count' applied to an object of class "character"

So, if I am trying to end up with something along the lines of

# A tibble: 2 × 3
  gender    total_count     max_species_count_per_gender
  <chr>           <int>                            <int>
1 feminine           17                   some_smaller_x
2 masculine          66                   some_smaller_y

Is this something I can approach as part of a summarise action, or will I need to do something else? Thank you for your help.

2

There are 2 best solutions below

0
Seth On BEST ANSWER

You could summarize twice. The use of .by... is an alternative to group_by & ungroup, either would work.

library(tidyverse)

starwars %>%
  filter(!is.na(gender)) %>%
  summarize(
    sub_count = n(),
    .by = c(species, gender)
  ) %>%
  summarize(
    total_count = sum(sub_count),
    max_species_count = max(sub_count),
    .by = gender
  )
#> # A tibble: 2 × 3
#>   gender    total_count max_species_count
#>   <chr>           <int>             <int>
#> 1 masculine          66                26
#> 2 feminine           17                 9

Created on 2024-02-29 with reprex v2.0.2

0
jpsmith On

You could also try mutate, reframe, and slice:

starwars %>% 
  filter(!is.na(gender)) %>% 
  mutate(total_count = n(), 
          .by = c(gender)) %>%
  reframe(total_count = total_count,
          max_species_count_per_gender = n(), 
          .by = c(species, gender)) %>%
  slice(1, .by = gender) %>% select(-species)

#   gender    total_count max_species_count_per_gender
#   <chr>           <int>                        <int>
# 1 masculine          66                           26
# 2 feminine           17                            9