Rolling Sum by 3 Conditions in R based on another data frame

56 Views Asked by At

I've got a df and i need to perform a rolling sum in another dataframe based on the values of the first df

  1. All the transactions of each company, by date & account
  2. This is the dataframe I want to build with the balance at the end of the day with each company's account
Date Company Account Value
2021-01-01 A a 10
2021-01-01 A b 10
2021-01-01 A b 5
2021-01-02 A a 12
2021-01-02 A a 4
2021-01-02 A b 4
Date Company Account Value
2021-01-01 A a 10
2021-01-01 A b 15
2021-01-02 A a 26
2021-01-02 A b 14
2021-01-01 B x i
2021-01-01 B y i
2021-01-02 B x ii
2021-01-02 B y ii

In Excel this would be something like a SUMIF where you state that the criteria must me Company & Account % Date<=Date

Thnks in advance

2

There are 2 best solutions below

3
Merijn van Tilborg On BEST ANSWER

using data.table

library(data.table)

setDT(df)

df[, .(Value = sum(Value)), by = .(Date, Company, Account)][, .(Date, Value = cumsum(Value)), by = .(Company, Account)]

results

#    Company Account       Date Value
# 1:       A       a 2021-01-01    10
# 2:       A       a 2021-01-02    26
# 3:       A       b 2021-01-01    15
# 4:       A       b 2021-01-02    19

data

df <- structure(list(Date = structure(c(18628L, 18628L, 18628L, 18629L, 
18629L, 18629L), class = c("IDate", "Date")), Company = c("A", 
"A", "A", "A", "A", "A"), Account = c("a", "b", "b", "a", "a", 
"b"), Value = c(10L, 10L, 5L, 12L, 4L, 4L)), row.names = c(NA, 
-6L), class = c("data.frame"))

EDIT update on OP's extra requirement in the comments

# same code as above
df <- df[, .(Value = sum(Value)), by = .(Date, Company, Account)][, .(Date, Value = cumsum(Value)), by = .(Company, Account)]

# now create a small table with the full range of dates per group
fullrange <- df[, .(Date = seq(min(Date), max(Date), "days")), .(Company, Account)]

# merge the two tables, missing dates will here get NA as value
output <- merge(df, fullrange, by = c("Company", "Account", "Date"), all = T)

# now replace the NA values with the previous value
setnafill(output, type = "locf", cols = "Value")

output

#    Company Account       Date Value
# 1:       A       a 2021-01-01    10
# 2:       A       a 2021-01-02    10
# 3:       A       a 2021-01-03    26
# 4:       A       b 2021-01-01    15
# 5:       A       b 2021-01-02    15
# 6:       A       b 2021-01-03    19

# changed the data a bit and replaced the 2021-01-02 for 2021-01-03
df <- structure(list(Date = structure(c(18628L, 18628L, 18628L, 18630L, 
18630L, 18630L), class = "Date"), Company = c("A", "A", "A", 
"A", "A", "A"), Account = c("a", "b", "b", "a", "a", "b"), Value = c(10L, 
10L, 5L, 12L, 4L, 4L)), row.names = c(NA, -6L), class = c("data.frame"))
0
CelloRibeiro On

It would be better if you could write your code so anyone can reproduce it easier. For example

df <- data.frame(
  Date = c("2021-01-02", "2021-01-02"),
  Company = c("A", "A"),
  Account = c("a", "b"),
  Value = c(10, 12),
)

Anyway, have you tried something like:

aggregate(Value~Company+Account, data=df, FUN=sum)