Predicted Probabilities based on logit model - correct specification?

220 Views Asked by At

I have a logit model with 4 independent variables:

logit <- glm(y ~ x1 + x2 + x3 + x4, family = binomial(), data = df)

All variables in the model are dichotomous (0,1).

I want to get the predicted probabilities for when x1=1 vs x1=0, while holding all other variables in the model constant, using the following code:

mean(predict(logit,transform(df,x1=1),type='response')) 
mean(predict(logit,transform(df,x1=0),type='response')) 

Is this the correct way to do this? I'm asking because I tried a few different logit models and compared the difference between the two values produced by the code above with the number the margins command produces:

summary(margins(logit, variables="treatment"))

For most models I tried, the difference between the two values equals the number produced by the margins command above, but for one model specification the numbers were slightly different.

Finally, when I use weights in my logit model, do I have to do anything differently when I calculate the predicted probabilities than when I don't include any weights?

1

There are 1 best solutions below

6
Vincent On

Your approach seems reasonable. The only thing I’ll note is that there are many packages out there which will do this automatically for you, and can estimate standard errors. One example is the marginaleffects package. (Disclaimer: I am the author.)

library(marginaleffects)

mod <- glm(am ~ vs + hp, data = mtcars, family = binomial)

# average predicted outcome for observed units with vs=1 or vs=0
predictions(
    mod,
    by = "vs")
#>       type vs predicted std.error statistic      p.value  conf.low conf.high
#> 1 response  0 0.3333333 0.1082292  3.079883 0.0020708185 0.1212080 0.5454587
#> 2 response  1 0.5000000 0.1328955  3.762356 0.0001683204 0.2395297 0.7604703

# are the average predicted outcomes equal?
predictions(
    mod,
    by = "vs",
    hypothesis = "pairwise")
#>       type  term  predicted std.error  statistic   p.value   conf.low conf.high
#> 1 response 0 - 1 -0.1666667 0.1713903 -0.9724392 0.3308321 -0.5025855 0.1692522

FYI, there is a wts argument to handle weights in the predictions() function.

Edit:

predictions(
    mod,
    newdata = datagridcf(vs = 0:1),
    by = "vs")

predictions(
    mod,
    variables = list(vs = 0:1),
    by = "vs")