I am fitting a negative binomial model using glm.nb from the MASS package of R with as output Y depending on variable y1, x, and z .
The problem is that I get different estimated coefficients using the same model but with the following equivalent formulas: Y ~ y1 + x * z or Y ~ x * z + y1. (reversing the order)
Below are my data and the results with the two formulas:
z <-c("T3", "T3", "T3", "T2", "T2", "T3", "T3", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T2", "T1", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T2", "T3", "T2", "T3", "T2", "T2", "T2", "T2", "T3", "T3", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T3", "T3", "T3", "T2", "T1", "T3", "T3", "T3", "T3", "T2", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T3", "T2", "T3", "T2", "T3", "T3", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T3", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T2", "T2", "T3", "T2", "T3", "T3", "T2", "T2")
x <- c("B-", "B-", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B+", "B-", "B+", "B+", "B-", "B-", "B+", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B+", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B+", "B-", "B+", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B-", "B+", "B+", "B-", "B-", NA )
Y <- c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 13, 19, 1, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 1, 0, 2, 0, 0, 6, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 10, 0, 0, 0, 0, 0, 0, 3)
y1 <- c(1, 3, 0, 1, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 1, 0, 5, 0, 2, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 1, 12, 1, 0, 0, 0, 2, 0, 4, 0, 0, 0, 0, 4, 0, 0, 0, 2, 2, 0, 0, 1, 0, 0, 16, 0, 14, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 6, 0, 0, 1, 0, 0, 1, 13)
data1 <- data.frame(Y, y1, z, x)
glm.nb(Y ~ x * z + y1, data = data1)
glm.nb(Y ~ y1 + x * z, data = data1)
> glm.nb(Y ~ x * z + y1, data = data1)
Call: glm.nb(formula = Y ~ x * z + y1, data = data1, init.theta = 0.2911311528,
link = log)
Coefficients:
(Intercept) x1 z1 z2 y1 x1:z1 x1:z2
-13.27592 -0.06054 -24.03075 11.75385 0.38529 0.54648 -0.11488
Degrees of Freedom: 112 Total (i.e. Null); 106 Residual
(1 observation deleted due to missingness)
Null Deviance: 94.45
Residual Deviance: 61.74 AIC: 199.8
> glm.nb(Y ~ y1 + x * z, data = data1)
Call: glm.nb(formula = Y ~ y1 + x * z, data = data1, init.theta = 0.2911311528,
link = log)
Coefficients:
(Intercept) y1 x1 z1 z2 x1:z1 x1:z2
-13.4950 0.3853 -0.7217 -24.4689 11.9729 -0.7759 0.5463
Degrees of Freedom: 112 Total (i.e. Null); 106 Residual
(1 observation deleted due to missingness)
Null Deviance: 94.45
Residual Deviance: 61.74 AIC: 199.8
I get different values of coefficients between the two formulas. Knowing that the variable z has only two observations with the T1 modality, can this influence the estimation of my coefficients?
Is it possible to have identical results with the two formulas Y ~ y1 + x * z and Y ~ x * z + y1 even if there is a convergence problem?
Thank you for your answers and your help.
This is admittedly a weird result — I would have expected some sort of warning. However, I think in the end you shouldn't worry about this difference, for reasons I will explain (tl;dr this is an optimistic fit to a small, noisy data set, and none of the interesting results differ between models).
all.equal(logLik(m1), logLik(m2))says that the models have equally good fit (up to numerical tolerance)all.equal(predict(m1), predict(m2))says that the predictions are not identical — off by 2% on average. However:These predictions are on the log scale, so the predicted values are tiny, and the differences between the predictions are tiny on the count scale (e.g.
exp(-35)is 6.3e-16 andexp(-36)is 2.3e-16).The confidence intervals for all coefficients except the effect of
y1are ridiculously large (and all of the p-values are 1):(If we set
ci = NAto turn off the confidence intervals we could see the differences in the coefficients.)This is an example of complete separation — not only are there only 2 observations in the T1 condition, but they're both zero ...
It would be wiser to omit the T1 condition entirely, it's just messing things up. In that case (
m1A <- update(m1, data = subset(data, z != "T1"))and similarly form2) you still have a significant effect ofy1, and the two orders agree.