I am trying to find the most efficient way to identify the longest rolling sum of a vector where the value is below a specific threshold. For instance, if we had 1:10 and a threshold of 6, then 3 is our longest rolling sum.
I have got the below code and function I've made for this, but is obviously extremely slow. Wondering if there's a more efficient or already implemented algorithm for this that identifies the longest runs with sums under the threshold.
library(dplyr)
library(zoo)
set.seed(1)
df <- data.frame(
g = rep(letters[1:10], each = 100),
x = runif(1000)
)
longest_rollsum <- function(x, threshold) {
for (i in 1:length(x)) {
rs <- rollsum(x, i, na.pad = TRUE, align = "right")
if (!any(rs <= threshold, na.rm = TRUE)) {
return(i - 1)
}
}
return(i)
}
df %>%
group_by(g) %>%
summarize(longest = longest_rollsum(x, 2))
#> # A tibble: 10 × 2
#> g longest
#> <chr> <dbl>
#> 1 a 7
#> 2 b 6
#> 3 c 9
#> 4 d 6
#> 5 e 6
#> 6 f 8
#> 7 g 6
#> 8 h 7
#> 9 i 11
#> 10 j 9

You can Write an R function that is faster:
Now comparing this to the above,
The change doesn't seem big. But it is quite huge when we change the threshold:
You could also write your own code in Rcpp which accounts for short circuiting, that does the work faster than the two approaches so far given.
Save the following in your working directory as
longest_rollsum.cpp:In R: Source the above file