Group by and sum/count of columns issues

Question

Group by and sum/count of columns issues

71 Views Asked by alecandia84 At 22 February 2024 at 21:39

I am trying to to group_by program names and count the total column amounts per program. There are NA's in the mix.

Taking a data frame like this: dataframe

and getting something like in return returned data

The following code only counts all non NA observations. It doesn't actually add up the numbers. Do I need to do some sort of ifelse here? I also wonder if the !is.na causes it to count all non NA observations, however, if I remove it, I get all NA's as my totals.

df %>%  
  group_by(ProgramName) %>%
  summarise(ED = sum(!is.na(HighSchool)), EMP = sum(!is.na(Employment)))

alternatively, is there is a way to group by program name and count the observations ONLY if they had a 1 in either column, not to tally up the total? That is closer to what I want anyway. Any support would be appreciated.

Original Q&A

There are 2 best solutions below

Andy Baxter On 22 February 2024 at 23:14

You can sum all non-NA values without needing to check for NA's or checking for values equalling 1:

library(tidyverse)

df <- tibble::tribble(
        ~ProgramName, ~School, ~Employed,
         "Program A",       1,         0,
         "Program A",       0,         0,
         "Program A",       0,         1,
         "Program A",       1,         0,
         "Program B",       1,         1,
         "Program B",       0,         1,
         "Program B",       1,         1,
         "Program B",      NA,        NA,
         "Program C",       1,         0,
         "Program C",       1,         1,
         "Program C",      NA,        NA,
         "Program C",       1,         1
        )
df |> 
  summarise(
    ED = sum(School, na.rm = TRUE),
    EMP = sum(Employed, na.rm = TRUE),
    .by = ProgramName
  )
#> # A tibble: 3 × 3
#>   ProgramName    ED   EMP
#>   <chr>       <dbl> <dbl>
#> 1 Program A       2     1
#> 2 Program B       2     3
#> 3 Program C       3     2

**L Tyrone** · Accepted Answer · 2024-02-22T23:03:55.013000

To answer both questions:

library(dplyr)

df <- structure(list(ProgramName = c("Program A", "Program A", "Program A", 
"Program A", "Program B", "Program B", "Program B", "Program B", 
"Program C", "Program C", "Program C", "Program C"), HighSchool = c(1L, 
0L, 0L, 1L, 1L, 0L, 1L, NA, 1L, 1L, NA, 1L), Employment = c(0L, 
0L, 1L, 0L, 1L, 1L, 1L, NA, 0L, 1L, NA, 1L)), class = "data.frame", row.names = c(NA, 
-12L))


df %>%  
  group_by(ProgramName) %>%
  summarise(across(HighSchool:Employment, ~ if(all(is.na(.))) NA else sum(., na.rm = TRUE)))

# A tibble: 3 × 3
  ProgramName HighSchool Employment
  <chr>            <int>      <int>
1 Program A            2          1
2 Program B            2          3
3 Program C            3          2

# This is the one you state that you actually want
df %>%  
  group_by(ProgramName) %>%
  summarise(ED = sum(HighSchool == 1, na.rm = TRUE),
            EMP = sum(Employment == 1, na.rm = TRUE))

A tibble: 3 × 3
  ProgramName    ED   EMP
  <chr>       <int> <int>
1 Program A       2     1
2 Program B       2     3
3 Program C       3     2

Group by and sum/count of columns issues

There are 2 best solutions below

Related Questions in R

Related Questions in DPLYR

Related Questions in DATA-WRANGLING

Trending Questions

Popular # Hahtags

Popular Questions