I am trying to simulate a dataset to test a model in R. Assume that we have a high number of medical cases that is assessed by a lower number of doctors. To keep things simple, the cases can be divided into two conditions: Disaese A and disease B. Now, based on whether the disease is A or B, doctors have a different diagnostic accuracy.
I created the number of cases, the doctors, and the states with corresponding possibilities.
case_id <- sample(x = 1:10000, size = 10000)
assessor_id <- sample(x=1:100, size = 100)
case_state <- sample(c('A','B'), size=10000, replace=T, prob=c(.93,.07))
df <- data.frame (case_id, assessor_id, case_state)
Now I would like to have for each case a prediction of the doctor, based on a fixed probability. So for state A the probability of correct identification could be .7 and for state B it could be .9. Based on these probabilties I need a new column for each case. How do I do that?
Thanks in advance!