How to calculate marginal effects of logit model with fixed effects by using a sample of more than 50 million observations

1.8k Views Asked by user8660846 At 06 December 2021 at 12:35

I have a sample of more than 50 million observations. I estimate the following model in R:

model1 <- feglm(rejection~  variable1+ variable1^2 +  variable2+ variable3+ variable4 | city_fixed_effects + year_fixed_effects, family=binomial(link="logit"),  data=database)

Based on the estimates from model1, I calculate the marginal effects:

mfx2 <- marginaleffects(model1)
summary(mfx2)

This line of code also calculates the marginal effects of each fixed effects which slows down R. I only need to calculate the average marginal effects of variables 1, 2, and 3. If I separately, calculate the marginal effects by using mfx2 <- marginaleffects(model1, variables = "variable1") then it does not show the standard error and the p-value of the average marginal effects.

Any solution for this issue?

Original Q&A

There are 1 best solutions below

Vincent On 08 December 2021 at 21:01

Both the fixest and the marginaleffects packages have made recent changes to improve interoperability. The next official CRAN releases will be able to do this, but as of 2021-12-08 you can use the development versions. Install:

library(remotes)
install_github("lrberge/fixest")
install_github("vincentarelbundock/marginaleffects")

I recommend converting your fixed effects variables to factors before fitting your models:

library(fixest)
library(marginaleffects)

dat <- mtcars
dat$gear <- as.factor(dat$gear)

mod <- feglm(am ~ mpg + mpg^2 + hp + hp^3| gear,
             family = binomial(link = "logit"),
             data = dat)

Then, you can use marginaleffects and summary to compute average marginal effects:

mfx <- marginaleffects(mod, variables = "mpg")
summary(mfx)

## Average marginal effects 
##       type Term Effect Std. Error  z value Pr(>|z|)  2.5 % 97.5 %
## 1 response  mpg 0.3352         40 0.008381  0.99331 -78.06  78.73
## 
## Model type:  fixest 
## Prediction type:  response

Note that computing average marginal effects requires calculating a distinct marginal effect for every single row of your dataset. This can be computationally expensive when your data includes millions of observations.

Instead, you can compute marginal effects for specific values of the regressors using the newdata argument and the typical function. Please refer to the marginaleffects documentation for details on those:

marginaleffects(mod, 
                variables = "mpg", 
                newdata = typical(mpg = 22, gear = 4))

##   rowid     type term     dydx std.error       hp mpg gear predicted
## 1     1 response  mpg 1.068844   50.7849 146.6875  22    4 0.4167502

How to calculate marginal effects of logit model with fixed effects by using a sample of more than 50 million observations

There are 1 best solutions below

Related Questions in R

Related Questions in BIGDATA

Related Questions in MARGINAL-EFFECTS

Related Questions in LOGITS

Trending Questions

Popular # Hahtags

Popular Questions