Unable to perform Bayesian analysis on data through jags in R

141 Views Asked by At

Here's my code

library(R2jags) #library(rjags)
library(bayesplot)
library(coda)


# set working directory
setwd("/Users/isa/Desktop/logreg")

# BUGS model code

cat("model {
  for( i in 1 : 8 ) {
    y[i] ~ dbin(theta[i],n[i])
    logit(theta[i]) <- beta0 + beta1 * x[i]
  }

  beta0 ~ dunif(-100, 100)
  beta1 ~ dunif(-100, 100)
}",
    file = "model_log.txt")



data <- read.delim("data.txt",
                   sep = "",
                   header = TRUE,
                   check.names = "FALSE",
                   stringsAsFactors = FALSE)




initsone <- list(beta0 = -100, beta1 = 100)
initstwo <- list(beta0 = 100, beta1 = -100)

initslog <- list(initsone, initstwo)
paramslog <- c("beta0", "beta1", "theta[6]")



outputlog <-
  jags(data = data,
       inits = initslog,
       parameters.to.save = paramslog,
       model.file = "model_log.txt",
       n.chains = 2,
       n.iter = 1000,
       n.burnin = 1000,
       n.thin = 1,
       DIC = TRUE#,
       # bugs.directory = getwd(),
       # working.directory = getwd()
  )

Everything works fine until I try and compile the output. I get an error:

Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains,  : 
  Error in node y[1]
Node inconsistent with parents

I believe this has something to do with my data, which was in an OpenBugs format:

list(y = c(1, 3, 6, 8, 11, 15, 17, 19), 
     n = c(20, 20, 20, 20, 20, 20, 20, 20), 
     x = c(30, 32, 34, 36, 38, 40, 42, 44), 
    N = 8 )

but I converted it into an R format:

y n x
1 20 30
3 20 32
6 20 34
8 20 36
11 20 38
15 20 40
17 20 42
19 20 44

Did I convert the data incorrectly? Where is it going wrong in the data? Everything works fine until I try and compile the output. I get an error stating: Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, : Error in node y[1] Node inconsistent with parents

1

There are 1 best solutions below

0
mfidino On

You have started your initial values for the parameters at the boundaries of your priors. This is not a problem, per se, but those logit scaled values are likely way to extreme and therefore will create initial estimates that are Pr == 0 or Pr == 1.

Just working through your data, let's assume we initial the model with beta0 = -100 and beta1 = 100.

For your first data point x = 30 so your logit-linear predictor starts out as:

theta = -100 + 100 * 30

theta = 2900
plogis(theta) = 1

So we are starting out with a success probability of 1, but y=1 for n=20 trials so the success probability cannot be 1. You can try a couple things to encourage the model to start sampling.

  1. Change your initial values. Make them much closer to 0 (e.g., between -4 and 4).
  2. Do step 1, but also rescale your x covariate to have a mean = 0 and sd = 1 (i.e.,, use the scale function in R. JAGS is not a fan of the direct output from scale so you would end up doing x = as.numeric(scale(c(30, 32, 34, 36, 38, 40, 42, 44))). This is helpful as it means you can use standard priors for your regression, but means you need to interpret your coefficients differently. There are a lot of resources online to look at related to mean-centering variables.

Another kind of way to get initial values somewhere in the correct region (assuming you are using vague priors) is to just fit a frequentist logistic regression model and use those estimates as the mean of some random normal distribution for each parameter. After all, with vague priors the frequentist estimate should be really close to the bayesian estimate as the likelihood is going to greatly outweigh the prior.

dat <- list(y = c(1, 3, 6, 8, 11, 15, 17, 19), 
     n = c(20, 20, 20, 20, 20, 20, 20, 20), 
     x = c(30, 32, 34, 36, 38, 40, 42, 44), 
     N = 8 )

# set up as matrix of successes and failures
y <- matrix(NA, ncol = 2, nrow = 8)
y[,1] <- dat$y
y[,2] <- dat$n - dat$y

m1 <- glm(y ~ dat$x, family = binomial)
summary(m1)

Call:
glm(formula = y ~ dat$x, family = binomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.39289  -0.20654  -0.04323   0.21294   0.50657  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -13.55295    2.05832  -6.584 4.57e-11 ***
dat$x         0.36630    0.05536   6.616 3.68e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 70.7352  on 7  degrees of freedom
Residual deviance:  0.7544  on 6  degrees of freedom
AIC: 27.71

Number of Fisher Scoring iterations: 4

You can see here that values around abs(100) are pretty far away from the parameter estimates of this specific model.

So, if you wanted, you could set some initial values like so:

initsone <- list(
beta0 = rnorm(1, m1$coefficients[1], 2),
beta1 = rnorm(1, m1$coefficients[2], 2)
)
initstwo <- list(
beta0 = rnorm(1, m1$coefficients[1], 2),
beta1 = rnorm(1, m1$coefficients[2], 2)
)

initslog <- list(initsone, initstwo)

This will, of course, only really work if you have no prior information and for very simple models such as this one.