A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters α > 0 and β > 0. The shape parameters define the probability of success.
I want to find the values for α and β that best describe my data from the perspective of a beta-binomial distribution. My dataset players consist of data about the number of hits (H), the number of at-bats (AB) and the conversion (H / AB) of a lot of baseball players. I estimate the PDF with the help of the answer of JulienD in Beta Binomial Function in Python
from scipy.special import beta
from scipy.misc import comb
pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)
Next, I write a loglikelihood function that we will minimize.
def loglike_betabinom(params, *args):
"""
Negative log likelihood function for betabinomial distribution
:param params: list for parameters to be fitted.
:param args: 2-element array containing the sample data.
:return: negative log-likelihood to be minimized.
"""
a, b = params[0], params[1]
k = args[0] # the conversion rate
n = args[1] # the number of at-bats (AE)
pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)
return -1 * np.log(pdf).sum()
Now, I want to write a function that minimizes loglike_betabinom
from scipy.optimize import minimize
init_params = [1, 10]
res = minimize(loglike_betabinom, x0=init_params,
args=(players['H'] / players['AB'], players['AB']),
bounds=bounds,
method='L-BFGS-B',
options={'disp': True, 'maxiter': 250})
print(res.x)
The result is [-6.04544138 2.03984464], which implies that α is negative which is not possible. I based my script on the following R-snippet. They get [101.359, 287.318]..
ll <- function(alpha, beta) {
x <- career_filtered$H
total <- career_filtered$AB
-sum(VGAM::dbetabinom.ab(x, total, alpha, beta, log=True))
}
m <- mle(ll, start = list(alpha = 1, beta = 10),
method = "L-BFGS-B", lower = c(0.0001, 0.1))
ab <- coef(m)
Can someone tell me what I am doing wrong? Help is much appreciated!!
One thing to pay attention to is that
comb(n, k)in your log-likelihood might not be well-behaved numerically for the values ofnandkin your dataset. You can verify this by applyingcombto your data and see ifinfs appear.One way to amend things could be to rewrite the negative log-likelihood as suggested in https://stackoverflow.com/a/32355701/4240413, i.e. as a function of logarithms of Gamma functions as in
You can then minimize the log-likelihood with
and that should give reasonable results.
You could check How to properly fit a beta distribution in python? for inspiration if you want to rework further your code.