Why is R ignoring my is.na() condition in my case_when()?

30 Views Asked by At

I'm trying to run a z test within a mutate call and output the p-value. However, some cells don't have enough data and are NA, which will cause the calculation to fail. To avoid this, I wrote a case_when() statement where if the critical variables (pass, total) were NA, then it should output NA_real_ as the p-value. However, R skips over these conditions and tries the formula regardless. Why is this happening and how can I fix it?

library(tidyverse)
library(stats)

sample_data <- tibble(
  subject = c("psych", "math", "psych", "math"),
  level = c("1", "1", "2", "2"),
  pass = c(NA_real_, 8, 13, 1),
  total = c(NA_real_, 102, 195, 36)
  )

sample_data %>%
  group_by(level) %>%
  mutate(sig_dif = case_when(
    is.na(pass) ~ NA_real_,
    is.na(total) ~ NA_real_,
    TRUE ~ round(prop.test(c(pass[subject == "psych"], pass[subject == "math"]),
                           c(total[subject == "psych"], total[subject == "math"]),
                           alternative = "two.sided")$p.value, 3)))

The above code works if the NA_real_ that I have are changed to values >=1, so I know it's something with how R is skipping the first 2 arguments in my case_when().

Thank you for your help!

1

There are 1 best solutions below

1
J.Sabree On

Though I couldn't figure out why it was skipping my NA_real_, I found a workaround, tryCatch() to throw NA if there wasn't enough data. tryCatch() only will give NA, not NA_real_, so I then added another line to convert it to numeric:

sample_data %>%
  group_by(level) %>%
  mutate(sig_dif = tryCatch(as.character(round(prop.test(c(pass[subject == "psych"], pass[subject == "math"]),
                           c(total[subject == "psych"], total[subject == "math"]),
                           alternative = "two.sided")$p.value, 3)), error=function(e) NA),
         sig_dif = as.numeric(sig_dif))