This question has been asked before here: How to keep a variable in fit$model for lm() in R that I'm *not* using within the lm call itself?
But I'm looking for a more general answer, as in my case the input data frame has many variables that I want to retain without using them as factors in the model, and also the variable values are not unique (and so cannot be used as row names).
So my example data might look like this:
df <- data.frame(a = c(1,2,3,4,5,6,1,2,3),
b = c("A", "B", "C","A", "B", "C","A", "B", "C" ),
country = c("Malawi", "Malawi","UK", "Malawi"),
Solvent_ref = c("DMSO", "DMSO", "H2O")
)
(note that all cases with a given value for b will have the same values in the variables I want to "carry over" but not use in the model, i.e. in the example above, country and Solvent_ref
If I then run
library(glmmTMB)
library(emmeans)
model = glmmTMB(a~b, data = df)
emmean_df = as.data.frame(emmeans(model,
type = "response",
specs = ~ b))
the resulting emmean_df has lost the variables country and Solvent_ref:
> emmean_df
b emmean SE df lower.CL upper.CL
1 A 1.999998 0.8164967 5 -0.09887403 4.098869
2 B 2.999995 0.8164967 5 0.90112308 5.098866
3 C 4.000004 0.8164967 5 1.90113222 6.098876
The output I'd like to see would be:
b emmean SE df lower.CL upper.CL country Solvent_ref
1 A 1.999998 0.8164967 5 -0.09887403 4.098869 Malawi DMSO
2 B 2.999995 0.8164967 5 0.90112308 5.098866 Malawi DMSO
3 C 4.000004 0.8164967 5 1.90113222 6.098876 UK H2O
One solution I can see would be to use a left_join to re-annotate the emmean data that comes out of the model with the 'lost variables', but is there a way to "carry them over" from the original data frame instead?
df_summary = df %>%
group_by(b) %>%
summarise(
country = unique(country),
Solvent_ref = unique(Solvent_ref)
)
emmean_df = emmean_df %>%
left_join(df_summary)