repeat previous value with ifelse function in R

69 Views Asked by At

I have a data frame with clicks per condition. I want to calculate the cummulative sum of these clicks per condition as the data comes in. I am currently using the ifelse() function to do this. However, for the "no" part of the test, I would want to repeat the value that was created in the previous "yes" part until there is the next "yes". Currently I am using NA's to create a placeholder instead.

How can I repeat the value that was created for the last "yes" when the test of the ifelse function is "no" until the next "yes"?

I've made a minimal example:

clicked <- round(runif(n = 20),0)
condition <- sample(c("Intervention", "Control"), size = 20, replace = T)
df <- data.frame(clicked, condition)

df %>%  select(clicked, condition) %>% group_by(condition) %>% 
  
  mutate(successes.intervention = ifelse(condition == "Intervention", cumsum(clicked), NA),
         N.intervention = ifelse(condition == "Intervention", 1:n(), NA),
         successes.control = ifelse(condition == "Control", cumsum(clicked), NA),
         N.control = ifelse(condition == "Control", 1:n(), NA)))

I want the output to look like this:

  clicked condition    successes.intervention N.intervention successes.control N.control
     <dbl> <chr>                         <dbl>          <int>             <dbl>     <int>
 1       0 Control                           0              0                 0         1
 2       1 Control                           0              0                 1         2
 3       0 Control                           0              0                 1         3
 4       1 Intervention                      1              1                 1         3
 5       0 Control                           1              1                 1         4
 6       0 Intervention                      1              2                 1         4
 7       0 Intervention                      1              3                 1         4
 8       0 Control                           1              3                 1         5
 9       0 Intervention                      1              4                 1         5
10       1 Intervention                      2              5                 1         5 
1

There are 1 best solutions below

1
r2evans On

How about this?

library(dplyr)
df %>%
  group_by(condition) %>% 
  mutate(
    data.frame(
      lapply(setNames(unique(df$condition), paste0("successes.", unique(df$condition))),
             function(z) if_else(condition == z, cumsum(clicked > 0), NA_integer_))
    ),
    across(starts_with("successes"), ~ if_else(row_number() == 1, coalesce(., 0L), .)),
    across(starts_with("successes"), ~ row_number() - 1L, .names = "N{sub('successes','',.col)}")
  ) %>%
  ungroup() %>%
  tidyr::fill(starts_with("successes"))
# # A tibble: 20 × 6
#    clicked condition    successes.Intervention successes.Control N.Intervention N.Control
#      <dbl> <chr>                         <int>             <int>          <int>     <int>
#  1       1 Intervention                      1                 0              0         0
#  2       1 Intervention                      2                 0              1         1
#  3       0 Intervention                      2                 0              2         2
#  4       1 Intervention                      3                 0              3         3
#  5       1 Intervention                      4                 0              4         4
#  6       1 Control                           0                 1              0         0
#  7       1 Intervention                      5                 1              5         5
#  8       0 Intervention                      5                 1              6         6
#  9       1 Intervention                      6                 1              7         7
# 10       1 Intervention                      7                 1              8         8
# 11       0 Control                           7                 1              1         1
# 12       1 Control                           7                 2              2         2
# 13       1 Control                           7                 3              3         3
# 14       0 Control                           7                 3              4         4
# 15       0 Intervention                      7                 3              9         9
# 16       1 Control                           7                 4              5         5
# 17       1 Intervention                      8                 4             10        10
# 18       0 Control                           8                 4              6         6
# 19       0 Control                           8                 4              7         7
# 20       1 Control                           8                 5              8         8

Walk-through:

  • lapply(..) iterates over the string literals (determined dynamically) and produces a list; when converted to a data.frame, then mutate will add the columns dynamically
  • internally in cumsum(..), we verify that condition is what we want to summarize, and then cumulatively sum up the number of clicks (or NA if not the desired condition)
  • across will iterate over all selected columns and return the row number (within the group) minus 1; it optionally renames the columns per the .names "glue" string. For this, I chose the already-created successes.* columns, since they were always broken down into the various condition levels.
  • we need another across to make sure the leading values are 0;
  • after ungrouping, we use tidyr::fill to fill-down the NA values imposed by the condition logic

Data, starting with set.seed(42) for reproducibility:

set.seed(42)
df <- data.frame(clicked = round(runif(n = 20),0),
                 condition = sample(c("Intervention", "Control"), size = 20, replace = T))
head(df)
#   clicked    condition
# 1       1 Intervention
# 2       1 Intervention
# 3       0 Intervention
# 4       1 Intervention
# 5       1 Intervention
# 6       1      Control