How to calculate cumulative sum per group when groups are repeated in R?

120 Views Asked by At

I have a dataframe representing paneldata with ids and time. For each id and year I like to calculate a cummulative sum. But id and time groups are repeated.

df <- data.frame(id = rep(12345,15),
                 year = c(rep(2021,5), rep(2022,5), rep(2023,5)), 
                 value = c(rep(5,5), rep(3,5), rep(4,5)))

df

      id year value
1  12345 2021     5
2  12345 2021     5
3  12345 2021     5
4  12345 2021     5
5  12345 2021     5
6  12345 2022     3
7  12345 2022     3
8  12345 2022     3
9  12345 2022     3
10 12345 2022     3
11 12345 2023     4
12 12345 2023     4
13 12345 2023     4
14 12345 2023     4
15 12345 2023     4

So instead of this

df %>% 
  group_by(id, year) %>% 
  mutate(value_cumsum = cumsum(value))

# A tibble: 15 × 4
# Groups:   id, year [3]
      id  year value value_cumsum
   <dbl> <dbl> <dbl>        <dbl>
 1 12345  2021     5            5
 2 12345  2021     5           10
 3 12345  2021     5           15
 4 12345  2021     5           20
 5 12345  2021     5           25
 6 12345  2022     3            3
 7 12345  2022     3            6
 8 12345  2022     3            9
 9 12345  2022     3           12
10 12345  2022     3           15
11 12345  2023     4            4
12 12345  2023     4            8
13 12345  2023     4           12
14 12345  2023     4           16
15 12345  2023     4           20

I need value_cumsum to be 5, 5, 5, 5, 5, 8, 8, 8, 8, 8, 12, 12, 12, 12, 12.

2

There are 2 best solutions below

2
ThomasIsCoding On BEST ANSWER

you can try

df %>%
    mutate(value_cumsum = value * !duplicated(value), .by = c("id", "year")) %>%
    mutate(value_cumsum = cumsum(value_cumsum))

or

df %>%
    distinct() %>%
    mutate(value_cumsum = cumsum(value)) %>%
    right_join(df)

which gives

      id year value value_cumsum
1  12345 2021     5            5
2  12345 2021     5            5
3  12345 2021     5            5
4  12345 2021     5            5
5  12345 2021     5            5
6  12345 2022     3            8
7  12345 2022     3            8
8  12345 2022     3            8
9  12345 2022     3            8
10 12345 2022     3            8
11 12345 2023     4           12
12 12345 2023     4           12
13 12345 2023     4           12
14 12345 2023     4           12
15 12345 2023     4           12
0
ismirsehregal On

A approach:

library(data.table)

DF <- data.frame(id = rep(12345,15),
                 year = c(rep(2021,5), rep(2022,5), rep(2023,5)), 
                 value = c(rep(5,5), rep(3,5), rep(4,5)))
setDT(DF)

DF[unique(DF, by = c("id", "year"))[, value_cumsum := cumsum(value)], on = c("id", "year", "value")]