What are the differences between ggpredict() and predictions()?

Question

What are the differences between ggpredict() and predictions()?

53 Views Asked by hy9fesh At 02 February 2024 at 22:27

I wanted to generate some predicted probabilities and I was recommended two commands: ggeffects::ggpredict() and margin effects::predictions(). However, some of the results seem similar (the conf.high) and other results seem different (the conf.low). What is going on here? Any insights would be helpful.

Here's an example:

logit <- glm(vs ~ mpg + cyl + disp + hp + drat + wt + qsec + am + gear + carb, mtcars, family = "poisson")

library(ggeffects)
ggpredict(logit, terms=c("mpg")) %>% as_tibble()

library(marginaleffects)
predictions(logit, by=c("mpg")) %>% as_tibble()

Original Q&A

There are 1 best solutions below

**Vincent** · Accepted Answer · 2024-02-03T04:14:02.353000

The two functions do different things by default.

ggpredict() with the terms argument computes predictions for each row of a data frame where mpg takes on values between 10 and 32, and all other variables are held at their means, including am, which is set to 0.41 even if it is presumably binary/categorical.
predictions() with the by argument computes predictions for each row in the actually observed data frame, and then takes the average of these predictions for each unique value of mpg.

In other words, ggpredict() reports predictions for “synthetic” data — hypothetical units which are exactly average along all dimensions except one. predictions() reports averages of predictions made on actually observed data.

It is easy to reproduce the same results as ggpredict() with predictions(), by using the newdata argument and the datagrid() function:

library(marginaleffects)
library(ggeffects)
logit <- glm(vs ~ mpg + cyl + disp + hp + drat + wt + qsec + am + gear + carb, mtcars, family = "poisson")

ggpredict(logit,
  terms = c("mpg")) |>
  data.frame() |> head()
#    x predicted std.error   conf.low conf.high group
# 1 10 0.2243430 1.2888961 0.01793916 2.8055810     1
# 2 12 0.2018065 1.0826924 0.02417384 1.6847077     1
# 3 14 0.1815340 0.9117794 0.03039834 1.0840915     1
# 4 16 0.1632979 0.7991309 0.03410037 0.7819915     1
# 5 18 0.1468937 0.7707321 0.03243060 0.6653523     1
# 6 20 0.1321375 0.8352217 0.02570893 0.6791536     1

predictions(logit,
  newdata = datagrid(mpg = seq(10, 34, by = 2), am = mean)) |>
  head()
# 
#  mpg    am Estimate Pr(>|z|)   S  2.5 % 97.5 %  cyl disp  hp drat   wt qsec
#   10 0.406    0.224   0.2462 2.0 0.0179  2.806 6.19  231 147  3.6 3.22 17.8
#   12 0.406    0.202   0.1394 2.8 0.0242  1.685 6.19  231 147  3.6 3.22 17.8
#   14 0.406    0.182   0.0613 4.0 0.0304  1.084 6.19  231 147  3.6 3.22 17.8
#   16 0.406    0.163   0.0233 5.4 0.0341  0.782 6.19  231 147  3.6 3.22 17.8
#   18 0.406    0.147   0.0128 6.3 0.0324  0.665 6.19  231 147  3.6 3.22 17.8
#   20 0.406    0.132   0.0154 6.0 0.0257  0.679 6.19  231 147  3.6 3.22 17.8
# 
# Columns: rowid, estimate, p.value, s.value, conf.low, conf.high, vs, cyl, disp, hp, drat, wt, qsec, gear, carb, mpg, am 
# Type:  invlink(link)

PS: this is not a logit model.

What are the differences between ggpredict() and predictions()?

There are 1 best solutions below

Related Questions in R

Related Questions in PREDICT

Trending Questions

Popular # Hahtags

Popular Questions