I am a beginner in R and stats, so I apologize if this question has an easy answer that I am just not seeing.
I am looking to solve some problems that want a cumulative answer < or > than q. Here is an example:
An actress has a probability of getting offered a job after a try-out of 0.08. She plans to keep trying out for new jobs until she gets offered one. Assume outcomes of try-outs are independent. Find the probability she will need to attend more than 4 try-outs.
I thought this would be straightforward use of pgeom:
pgeom(q = 4, prob = 0.08, lower.tail = FALSE) which returns 0.6590815
This is not the correct answer to the problem. I tried 1 - pgeom(q = 4, prob = 0.08), which obviously still gave the same answer. I tried setting q = 5 to see if maybe R was including 4 when it should be excluded. Still wrong. (The correct answer to the problem is 0.7164.)
Running into the same issue with ppois. Can anyone explain what I might be doing wrong?
TL;DR:
lower.tail = Fbehaves unintuitively - unlike the default,lower.tail = T, it does not include the given value in the interval. See below for explanation.R help for
?pgeomsays:For example, success on 5th tryout = 4 failures.
Remember that
dgeom()is giving the probability for a single, exact number of failures before first success. We want the sum ofdgeom()across all possible numbers of failures greater or equal to 4.So, this is the correct result. But how to achieve it with
pgeom()?It makes sense from this perspective: The 4 is included in the compliment we are looking for, therefore we should not include in the other part.
This is also represented in the behaviour of
lower.tail = FALSEandlower.tail = TRUE. Our intuition tells us, both should behave the same and include the given value in the interval...Like you, I would expect
to give cumulative probabilities from 4 onwards. Instead, it is giving the probability for 5 or more failures before the first success:
To understand this behaviour of the
lower.tailargument, see what help says about it:The equivalence relation is only included for
TRUE, the default.When
FALSE, the direction changes like expected, but now the equivalence relation is gone. Sad for intuition, great for maths:lower.tail = Tandlower.tail = Fare counterparts that have to add up to 1. So the given value can only be included in one of them, otherwise the following would not add up to 1:Luckily, developers chose that the default behaviour is to include the given value in the interval. That means we can use this function intuitively, and just resort to
1 - pgeom()to be explicit about wanting the compliment, where the value is obviously not included.