pgeom and ppois returning incorrect values when trying to find values greater or less than q

Question

pgeom and ppois returning incorrect values when trying to find values greater or less than q

46 Views Asked by Rylee Baca At 27 January 2024 at 23:39

I am a beginner in R and stats, so I apologize if this question has an easy answer that I am just not seeing.

I am looking to solve some problems that want a cumulative answer < or > than q. Here is an example:

An actress has a probability of getting offered a job after a try-out of 0.08. She plans to keep trying out for new jobs until she gets offered one. Assume outcomes of try-outs are independent. Find the probability she will need to attend more than 4 try-outs.

I thought this would be straightforward use of pgeom:

pgeom(q = 4, prob = 0.08, lower.tail = FALSE) which returns 0.6590815

This is not the correct answer to the problem. I tried 1 - pgeom(q = 4, prob = 0.08), which obviously still gave the same answer. I tried setting q = 5 to see if maybe R was including 4 when it should be excluded. Still wrong. (The correct answer to the problem is 0.7164.)

Running into the same issue with ppois. Can anyone explain what I might be doing wrong?

Original Q&A

There are 1 best solutions below

**uke** · Answer 1 · 2024-01-28T00:42:42.147000

TL;DR:
lower.tail = F behaves unintuitively - unlike the default, lower.tail = T, it does not include the given value in the interval. See below for explanation.

R help for ?pgeom says:

x, q
vector of quantiles representing the number of failures [...] before success occurs.

For example, success on 5th tryout = 4 failures.

Remember that dgeom() is giving the probability for a single, exact number of failures before first success. We want the sum of dgeom() across all possible numbers of failures greater or equal to 4.

dgeom(4:1000000, .08) |> sum()
#> 0.716393

So, this is the correct result. But how to achieve it with pgeom()?

1 - pgeom(3, .08)
#> 0.716393

It makes sense from this perspective: The 4 is included in the compliment we are looking for, therefore we should not include in the other part.

This is also represented in the behaviour of lower.tail = FALSE and lower.tail = TRUE. Our intuition tells us, both should behave the same and include the given value in the interval...

Like you, I would expect

pgeom(4, .08, lower.tail = F)
#> 0.6590815

to give cumulative probabilities from 4 onwards. Instead, it is giving the probability for 5 or more failures before the first success:

dgeom(5:1000000, .08) |> sum()
#> 0.6590815

To understand this behaviour of the lower.tail argument, see what help says about it:

The equivalence relation is only included for TRUE, the default.
When FALSE, the direction changes like expected, but now the equivalence relation is gone. Sad for intuition, great for maths: lower.tail = T and lower.tail = F are counterparts that have to add up to 1. So the given value can only be included in one of them, otherwise the following would not add up to 1:

pgeom(4, .08, lower.tail = T) + pgeom(4, .08, lower.tail = F)
#> 1

Luckily, developers chose that the default behaviour is to include the given value in the interval. That means we can use this function intuitively, and just resort to 1 - pgeom() to be explicit about wanting the compliment, where the value is obviously not included.

pgeom and ppois returning incorrect values when trying to find values greater or less than q

There are 1 best solutions below

Related Questions in R

Related Questions in DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions