I'm trying to run a z test within a mutate call and output the p-value. However, some cells don't have enough data and are NA, which will cause the calculation to fail. To avoid this, I wrote a case_when() statement where if the critical variables (pass, total) were NA, then it should output NA_real_ as the p-value. However, R skips over these conditions and tries the formula regardless. Why is this happening and how can I fix it?
library(tidyverse)
library(stats)
sample_data <- tibble(
subject = c("psych", "math", "psych", "math"),
level = c("1", "1", "2", "2"),
pass = c(NA_real_, 8, 13, 1),
total = c(NA_real_, 102, 195, 36)
)
sample_data %>%
group_by(level) %>%
mutate(sig_dif = case_when(
is.na(pass) ~ NA_real_,
is.na(total) ~ NA_real_,
TRUE ~ round(prop.test(c(pass[subject == "psych"], pass[subject == "math"]),
c(total[subject == "psych"], total[subject == "math"]),
alternative = "two.sided")$p.value, 3)))
The above code works if the NA_real_ that I have are changed to values >=1, so I know it's something with how R is skipping the first 2 arguments in my case_when().
Thank you for your help!
Though I couldn't figure out why it was skipping my NA_real_, I found a workaround, tryCatch() to throw NA if there wasn't enough data. tryCatch() only will give NA, not NA_real_, so I then added another line to convert it to numeric: