I'm trying to compare success rates in two samples. I'm treating each sample as a binomial distribution, then trying to compare the success rates with a two tailed hypothesis test. (x1 is the number of successes in sample 1, and n1 is the total number of trials, similarly for sample 2.) I believe the code I have below is very similar to the process described in this post
for performing a hypothesis test for success rate between two groups. however the pVal I'm getting is 1.82372486268966, which doesn't make sense. is there something I'm misunderstanding about the theory for this hypothesis test, or is there an error in my code below?
code:
import math
import scipy.stats as stats
x1=195
x2=5481
n1=135779
n2=81530
hatP = (n1*(x1/n1) + n2*(x2/n2))/(n1 + n2)
hatQ = 1 - hatP
hatP1 = x1/n1
hatP2 = x1/n2
Z = (hatP1 - hatP2)/(math.sqrt(hatP*hatQ*(1/n1 + 1/n2)))
pVal = 2*(1 - stats.norm.cdf(Z))
Z: -1.3523132192521408
pVal: 1.82372486268966
After correcting the
hatP2 = x1/n2tohatP1 = x2/n2, your statistic is approximately normally distributed under the null hypothesis that the two samples were drawn from the same binomial distribution.The calculation of the p-value depends on the alternative hypothesis. It looks like you are interested in the two-sided null hypothesis that the samples were not drawn from the same binomial distribution, in which case the p-value would be calculated as:
In your case, the p-value underflows; it is too small to be represented in double precision.
Rather than writing a custom test, consider using
scipy.stats.fisher_exact,scipy.stats.barnard_exact, orscipt.stats.boschloo_exact.