I have a data frame that looks like this:
| cnpj | time2 | n_act_contracts |
|---|---|---|
| 12 | -1 | 10 |
| 12 | 0 | 8 |
| 12 | 1 | 6 |
| 13 | -1 | 3 |
| 13 | 0 | 5 |
| 13 | 1 | 7 |
| 14 | 1 | 3 |
| 14 | 2 | 5 |
| 14 | 3 | 7 |
| 15 | NA | 3 |
| 15 | NA | 5 |
| 15 | NA | 7 |
I want to define another variable that takes, for all observations that have the same cnpj, the value of the n_act_contracts when the variable time2 is equal to zero.
| cnpj | time2 | n_act_contracts | zero_n_act_contracts |
|---|---|---|---|
| 12 | -1 | 10 | 8 |
| 12 | 0 | 8 | 8 |
| 12 | 1 | 6 | 8 |
| 13 | -1 | 3 | 5 |
| 13 | 0 | 5 | 5 |
| 13 | 1 | 7 | 5 |
| 14 | 1 | 3 | NA |
| 14 | 2 | 5 | NA |
| 14 | 3 | 7 | NA |
| 15 | NA | 3 | NA |
| 15 | NA | 5 | NA |
| 15 | NA | 7 | NA |
I have beeing doing it with the following lines of code, but I need to make it more efficient.
data <- data %>%
group_by(cnpj) %>%
mutate(
zero_n_act_contracts = ifelse(time2 == 0,n_act_contracts,-1000),
zero_n_act_contracts = max(zero_n_act_contracts, na.rm = TRUE),
zero_n_act_contracts = ifelse(zero_n_act_contracts == -1000,NA,zero_n_act_contracts))
obs: I have already tryied replacing base "ifelse" by dplyr: "if_else", but my code took longer to run.
We can use
-output
data