pgeom and ppois returning incorrect values when trying to find values greater or less than q

46 Views Asked by At

I am a beginner in R and stats, so I apologize if this question has an easy answer that I am just not seeing.

I am looking to solve some problems that want a cumulative answer < or > than q. Here is an example:

An actress has a probability of getting offered a job after a try-out of 0.08. She plans to keep trying out for new jobs until she gets offered one. Assume outcomes of try-outs are independent. Find the probability she will need to attend more than 4 try-outs.

I thought this would be straightforward use of pgeom:

pgeom(q = 4, prob = 0.08, lower.tail = FALSE) which returns 0.6590815

This is not the correct answer to the problem. I tried 1 - pgeom(q = 4, prob = 0.08), which obviously still gave the same answer. I tried setting q = 5 to see if maybe R was including 4 when it should be excluded. Still wrong. (The correct answer to the problem is 0.7164.)

Running into the same issue with ppois. Can anyone explain what I might be doing wrong?

1

There are 1 best solutions below

2
uke On

TL;DR:
lower.tail = F behaves unintuitively - unlike the default, lower.tail = T, it does not include the given value in the interval. See below for explanation.


R help for ?pgeom says:

x, q
vector of quantiles representing the number of failures [...] before success occurs.

For example, success on 5th tryout = 4 failures.

Remember that dgeom() is giving the probability for a single, exact number of failures before first success. We want the sum of dgeom() across all possible numbers of failures greater or equal to 4.

dgeom(4:1000000, .08) |> sum()
#> 0.716393

So, this is the correct result. But how to achieve it with pgeom()?

1 - pgeom(3, .08)
#> 0.716393

It makes sense from this perspective: The 4 is included in the compliment we are looking for, therefore we should not include in the other part.

This is also represented in the behaviour of lower.tail = FALSE and lower.tail = TRUE. Our intuition tells us, both should behave the same and include the given value in the interval...

Like you, I would expect

pgeom(4, .08, lower.tail = F)
#> 0.6590815

to give cumulative probabilities from 4 onwards. Instead, it is giving the probability for 5 or more failures before the first success:

dgeom(5:1000000, .08) |> sum()
#> 0.6590815

To understand this behaviour of the lower.tail argument, see what help says about it:

help

The equivalence relation is only included for TRUE, the default.
When FALSE, the direction changes like expected, but now the equivalence relation is gone. Sad for intuition, great for maths: lower.tail = T and lower.tail = F are counterparts that have to add up to 1. So the given value can only be included in one of them, otherwise the following would not add up to 1:

pgeom(4, .08, lower.tail = T) + pgeom(4, .08, lower.tail = F)
#> 1

Luckily, developers chose that the default behaviour is to include the given value in the interval. That means we can use this function intuitively, and just resort to 1 - pgeom() to be explicit about wanting the compliment, where the value is obviously not included.