Comparing a selected range of indexes to a threshold

85 Views Asked by At

I want to compare all scores in a certain range (from i until the index that belongs to the range of i) to the last baseline score, and update the baseline score in a recursive way. The range is based on the corresponding index that meets the minimal time difference required to be able to confirm a new baseline. If all scores in this range are lower than the last baseline score, then I want the new baseline to become the highest value of all scores in the range (ie, the closest to the old baseline).

df <- tibble(
i = c("1", "2", "3", "4", "5", "6", "7", "8", "9"),
range_index = c("2", "4", "4", "5", "7", "7", "9", "9", "NA"),
score = c("5", "4", "4", "3", "2", "2", "3", "1", "1")) 

I am looking to do something like this in sapply or a for loop:

df <- df %>%
mutate(
baseline = first(score),
baseline = sapply(1:n(), function(i) {
  if (all(score[i]:score[range_index[i]]) < baseline[i-1]) {return(max(score[i]:score[range_index[i]]))}
  else {return(baseline[i-1])}}))

But I think score[i]:score[range[i]] doesn't compare all scores to the last baseline. How can I create a condition that is true if each of these scores are lower than the last baseline?

The desired outcome is:

baseline = c("5", "4", "4", "3", "3", "3", "3", "1", "1")

Explanation: the first baseline is 5. At i=2 the new baseline is set to 4, as all scores between i=2 and i=4 (corresponding range) are lower than 5. The new baseline is 4, and not 3, because 4 is the greatest score out of i=2 until i=4. At i=4 we obtain the new baseline 3, because all scores in the range (score[4]=3, score[5]=2) are lower than the last baseline, which was 4. At i=5, we don't obtain a new baseline despite the decrease, because the range includes i=7, and score[7] (==3) is not lower than the last baseline (==3). The new baseline at i=8 is obtained as all scores i[8:9] are lower than the last baseline of 3.

3

There are 3 best solutions below

5
jblood94 On BEST ANSWER

I think it's best to iterate with a for loop.

library(tidyverse)

df <- tibble(id = rep(1:2, each = 9),
             range_index = rep(c(2,4,4,5,7,7,9,9,NA), 2),
             score = c(5,4,4,3,2,2,3,1,1,5,4,4,3,2,2,3,1,0))
df %>%
  group_by(id) %>%
  mutate(
    baseline = {
      baseline <- score
      for (i in 2:(n() - 1)) {
        baseline[i] <- min(baseline[i - 1], max(score[i]:score[range_index[i]]))
      }
      baseline[n()] <- baseline[n() - 1]
      baseline
    }
  )
#> # A tibble: 18 × 4
#> # Groups:   id [2]
#>       id range_index score baseline
#>    <int>       <dbl> <dbl>    <dbl>
#>  1     1           2     5        5
#>  2     1           4     4        4
#>  3     1           4     4        4
#>  4     1           5     3        3
#>  5     1           7     2        3
#>  6     1           7     2        3
#>  7     1           9     3        3
#>  8     1           9     1        1
#>  9     1          NA     1        1
#> 10     2           2     5        5
#> 11     2           4     4        4
#> 12     2           4     4        4
#> 13     2           5     3        3
#> 14     2           7     2        3
#> 15     2           7     2        3
#> 16     2           9     3        3
#> 17     2           9     1        1
#> 18     2          NA     0        1
2
zephryl On

Using purrr::accumulate2:

library(dplyr)
library(tidyr)
library(purrr)

df %>%
  mutate(baseline = accumulate2(
    i,
    replace_na(range_index, max(i)),
    \(b, i, r) min(b, max(score[i:r])),
    .init = score[[1]]
  )[-1])

Result:

  i range_index score baseline
1 1           2     5        5
2 2           4     4        4
3 3           4     4        4
4 4           5     3        3
5 5           7     2        3
6 6           7     2        3
7 7           9     3        3
8 8           9     1        1
9 9        <NA>     1        1
0
Evy On

Regarding the question of how to compare a selected range of indexes [i:j] to a threshold, this can be done using the minimum value or maximum value of this range to the threshold: min(score[i:j]) > threshold[i] (if all scores need to be greater than the threshold) or max(score[i:j]) < threshold[i] (if all scores need to be lower than the threshold)