R Dplyr: How do I add columns from an ungrouped dataframe to a grouped dataframe and retain the grouping?

286 Views Asked by At

I have a main data frame (data) that contains information about purchases: names, year, city, and a few other variables:

Name Year City
N1   2018 NY
N2   2019 SF
N2   2018 SF
N1   2010 NY
N3   2020 AA

I used new_data <- data %>% group by(Name) %>% tally(name = "Count") to get something like this:

Name Count
N1   2
N2   2
N3   1

My questions, preferably using dplyr:

1) How do I now add the city that corresponds to Name to new_data, i.e:

Name Count City
N1   2     NY
N2   2     SF
N3   1     AA

2) How do I add the earliest year of each Name to new_data, i.e.:

Name Count City Year
N1   2     NY   2010
N2   2     SF   2018
N3   1     AA   2020
2

There are 2 best solutions below

1
arg0naut91 On

It seems that summarise may suit you better, for example:

data %>%
  group_by(Name, City) %>%
  summarise(Count = n(),
            Year = min(Year))

Output:

# A tibble: 3 x 4
# Groups:   Name [3]
  Name  City  Count  Year
  <fct> <fct> <int> <int>
1 N1    NY        2  2010
2 N2    SF        2  2018
3 N3    AA        1  2020

While you can group with City as well to keep it in the output.

0
akrun On

An option with data.table

library(data.table)
setDT(data)[, .(Count = .N, Year = min(Year)), .(Name, City)]