Calculating an integral of a Bernoulli likelihood function in R

72 Views Asked by At

I am trying to integrate a very simple likelihood function (three Bernoulli trials) in R, but it seems that I have not understood how integrate function works. My code is as follows:

integrate(
    function(theta) prod(
        dbinom(
            x = c(1,0,1), #x is the data: "success, fail, success"
            size = 1, #Just one Bernoulli trial times the length of x.
            prob = theta, #Since, this is a likelihood function, the parameter value is the variable
            log = F #We use a likelihood instead of log-likelihood
        )
    ),
    lower = 0, #The parameter theta cannot be smaller than zero...
    upper = 1 #...nor greater than one.
)

However, I get this error message:

Error in integrate(function(theta) prod(dbinom(x = c(1, 0, 1), size = 1,  : 
  evaluation of function gave a result of wrong length

Now, why is the result of wrong length? The result is always of length 1, since the anonymous function uses prod function, which in turn creates a product of all the vector elements the function dbinom returns (this vector has the length of three, since its first argument x has length of three).

What the result should be then to be of right length?

2

There are 2 best solutions below

3
Avraham On BEST ANSWER

The issue here is not with dbinom but with prod. dbinom is a vectorized function but prod, according to its definition, returns a "a numeric (of type "double") or complex vector of length one."

As noted in the comments, the simplest approach is to simply wrap your function in Vectorize inside the integrate call. For example:

fn <- function(theta) prod(dbinom(c(1, 0, 1), size = 1, prob = theta, log = FALSE))
integrate(Vectorize(fn), 0, 1)
0.08333333 with absolute error < 9.3e-16
2
Rui Barradas On

The question code is wrong in two points:

  • You must optimise, not integrate;
  • a product of many factors in (0, 1) will become close to zero, that's why you need to sum the log-likelihood.
fn <- function(theta, x) sum(dbinom(x, size = 1, prob = theta, log = TRUE))

MLE <- function(x) {
  f <- function(theta, x) fn(theta, x)
  optimise(f, c(0, 1), x = x, maximum = TRUE)$maximum
}

# the question's sample
MLE(c(1, 0, 1))
#> [1] 0.666675

# other samples, increasingly larger
set.seed(2023)
x <- rbinom(10, 1, p = 2/3)
MLE(x)
#> [1] 0.9999339

x2 <- rbinom(100, 1, p = 2/3)
MLE(x2)
#> [1] 0.5600005

x3 <- rbinom(1000, 1, p = 2/3)
MLE(x3)
#> [1] 0.691006

x4 <- rbinom(10000, 1, p = 2/3)
MLE(x4)
#> [1] 0.6746081

Created on 2023-07-31 with reprex v2.0.2


Edit

I had misunderstood the question, I thought it was asking for the MLE given the sample c(1, 0, 1).
Here is a corrected sum/log version, which is nothing more than the solution in Avraham's answer. The result and absolute error are the same.

fn <- function(theta) sum(dbinom(c(1, 0, 1), size = 1, prob = theta, log = TRUE)) |> exp()
integrate(Vectorize(fn), 0, 1)
#> 0.08333333 with absolute error < 9.3e-16

Created on 2023-08-01 with reprex v2.0.2