Okay, so Bayes theorem is p(A|B) = p(A) * P(B|A) * 1/p(B).
I want to simulate p(A|B) using all the uncertainty surrounding p(A), p(B|A) and p(B).
rbeta seems like a good choice.
data: imagine I just spent 20 days in a new location and have a sense of the daily weather.
library(tidyverse)
set.seed(4434)
sims = 1000 #let's simulate each of these proportions 1000 times
p(rain) = rbeta(sims,5,15) #it rained 5 times in the 20 days
p(overcast | rain) = rbeta(sims, 4,1) #of the 5 times it rained, it wasn't completely overcast 1x
p(overcast) = rbeta(sims, 10,10) #half of the days I was there were overcast
That should give me anything that I need to calculate p(A|B) or p(rain | overcast). Let's calculate this and take a quick look:
data <- tibble(p_rain, p_overcast, p_overcast_rain) %>%
mutate(numerator = p_rain * p_overcast_rain,
p_rain_overcast = numerator / p_overcast)
summary(data$p_rain_overcast)
table(data$p_rain_overcast > 1)
the problem: I am getting results outside of the 0 to 1 bounds

So, in 22 of the simulations the numerator > denominator, which would suggest the impossible: Both B and A happening is more likely than B happening. This is clearly a possible result with the simulation approach.
The question: Now, assuming this approach isn't completely silly (is it?), I am curious about what do I do with values above 1 (or more generally outside of 0 to 1)?
#if you're curious what I did# I can see three options, but I am not sure about their validity or pitfalls:
**1. just make p_rain_overcast values > 1 = 1 **
data %>%
mutate(p_rain_overcast = ifelse(p_rain_overcast > 1,
1,
p_rain_overcast)) %>%
reframe(
quantiles = c(0,.1,.25,.5,.75,.9,1),
values = quantile(p_rain_overcast, c(0,.1,.25,.5,.75,.9,1)))
**2. I can resimulate until the numerator is less or equal to denominator **
SimNumerator <- function(limit, p_a, p_ba){
numerator <- p_a * p_ba
#something to put the values
fillThis <- vector("numeric", length = length(numerator))
for(i in 1:length(numerator)) {
#one item at a time
result <- numerator[i]
#keep going until the numerator is less than or equal to denominator
while(result > limit[i]){
prior = rbeta(1,5,15) #it rained 5 times in the 20 days
likelihood = rbeta(1, 4,1) #of the 5 times it rained, it wasn't completely overcast once
result <- prior * likelihood
}
fillThis[i] <- result
}
return(fillThis)
}
new_numerator <- SimNumerator(p_a = p_rain,
p_ba = p_overcast_rain,
limit = p_overcast)
data %>% tibble(new_numerator) %>%
mutate(p_rain_overcast = new_numerator / p_overcast) %>%
reframe(
quantiles = c(0,.1,.25,.5,.75,.9,1),
values = quantile(p_rain_overcast, c(0,.1,.25,.5,.75,.9,1)))
3. Is pretty much #2, but I resimulate until the denominator is greater than numerator This is just changing what gets resampled for rbeta.
#comparison#
The interpretations are similar (after all only 22 / 1000 values changed). The limiting approach creates a spike at 1. The resampling approach has a downside in that it is more likely for low p(overcast) or denominator values d OR high numerator values getting resampled, which might misrepresent some of the sampling distributions.


