How can I get the average number of days between changes in a variable over a period of time in R?

50 Views Asked by At

I have a dataset that has date and another variable (bank rate). Here is a snippet of the data:

![enter image description here

I want to calculate the average number of days between each consecutive change in Bank rate. For example to get an output as such:

enter image description here

Essentially, I am trying to calculate the average number of days a rate remains for before it changes.

I am able to do the usual difftime() function, However I need it to specifically only calculate the difference when there are changes in the rate, and then average it out. I am new to R and unable to figure out how to go about this

1

There are 1 best solutions below

6
MarBlo On BEST ANSWER

I have a made a random sequence of dates in the timeframe as above and have used bank_rate from above and put them in a DF.

This DF is ordered for date. Data which do not show any change in bank_rate are then removed by filter. (see consecutive bank_rates of 2). A new variable days_from_before is created which calculates the number of days of consecutive dates.

The average is calculated as the mean from days_from_before.

library(dplyr)

set.seed(123)
date <- sample(seq(as.Date("2018/02/07"), as.Date("2023/01/15"), by = "day"), 14)
bank_rate <- c(1.5, 1.5, rep(2, 6), 0.5, 1.25, 4.5, 4.5, 4.75, 4.75)

df <- data.frame(date, bank_rate)

df
#>          date bank_rate
#> 1  2019-03-28      1.50
#> 2  2019-05-15      1.50
#> 3  2018-08-04      2.00
#> 4  2019-07-17      2.00
#> 5  2018-08-20      2.00
#> 6  2020-09-01      2.00
#> 7  2021-03-24      2.00
#> 8  2021-09-21      2.00
#> 9  2021-07-13      0.50
#> 10 2021-07-28      1.25
#> 11 2020-12-10      4.50
#> 12 2021-12-05      4.50
#> 13 2019-12-03      4.75
#> 14 2019-10-01      4.75

ddf <- df |>
  arrange(date) |>
  filter(bank_rate != dplyr::lag(bank_rate, default = 0)) |> 
  mutate(
    days_from_before = as.numeric(difftime(date, dplyr::lag(date))),
    days_from_before = ifelse(is.na(days_from_before), 0, days_from_before)
  )

ddf
#>          date bank_rate days_from_before
#> 1  2018-08-04      2.00                0
#> 2  2019-03-28      1.50              236
#> 3  2019-07-17      2.00              111
#> 4  2019-10-01      4.75               76
#> 5  2020-09-01      2.00              336
#> 6  2020-12-10      4.50              100
#> 7  2021-03-24      2.00              104
#> 8  2021-07-13      0.50              111
#> 9  2021-07-28      1.25               15
#> 10 2021-09-21      2.00               55
#> 11 2021-12-05      4.50               75

mean(ddf$days_from_before)
#> [1] 110.8182