I have a sample of more than 50 million observations. I estimate the following model in R:
model1 <- feglm(rejection~ variable1+ variable1^2 + variable2+ variable3+ variable4 | city_fixed_effects + year_fixed_effects, family=binomial(link="logit"), data=database)
Based on the estimates from model1, I calculate the marginal effects:
mfx2 <- marginaleffects(model1)
summary(mfx2)
This line of code also calculates the marginal effects of each fixed effects which slows down R. I only need to calculate the average marginal effects of variables 1, 2, and 3. If I separately, calculate the marginal effects by using mfx2 <- marginaleffects(model1, variables = "variable1") then it does not show the standard error and the p-value of the average marginal effects.
Any solution for this issue?
Both the
fixestand themarginaleffectspackages have made recent changes to improve interoperability. The next official CRAN releases will be able to do this, but as of 2021-12-08 you can use the development versions. Install:I recommend converting your fixed effects variables to factors before fitting your models:
Then, you can use
marginaleffectsandsummaryto compute average marginal effects:Note that computing average marginal effects requires calculating a distinct marginal effect for every single row of your dataset. This can be computationally expensive when your data includes millions of observations.
Instead, you can compute marginal effects for specific values of the regressors using the
newdataargument and thetypicalfunction. Please refer to themarginaleffectsdocumentation for details on those: