I am currently trying to impute missing entries in a three-level dataset using the mi-package.
Currently, I am facing two issues with how to impute multilevel data correctly:
- Firstly, the variables at level two and three are imputed with different values within the same cluster/ID.
- Secondly, I am also unable to set the boundaries for a continuous variable that cannot have values outside a specific interval.
Problem 1. How to create a multilevel_missing_data.frame
The documentation states that:
Objects from the Class
Objects can be created by calls of the form
new("multilevel_missing_data.frame", ...). However, its users almost always will pass adata.frameto themissing_data.framefunction and specify thesubclassandgroupsarguments.Slots
The
multilevel_missing_data.frameclass inherits from themissing_data.frame-class and has two additional slots
- groups Object of class
characterindicating which variables define the multilevel structure- mdf_list Object of class
mdf_listwhose elements contain amissing_data.framefor each group. This slot is filled automatically by theinitializemethod.
If I understand the documentation correctly, it should be possible to create a multilevel_missing_data.frame by specifying subclass and groups:
library(dlpyr)
library(magrittr)
library(mi)
# loading and preparing data
url <- "https://simongrund1.github.io/posts/multiple-imputation-for-three-level-and-cross-classified-data_files/example_3l.Rdata"
download.file(url, basename(url))
load("example_3l.Rdata")
dat %<>%
mutate(
z = as.ordered(round(z)+2),
x = abs(x)
)
# Create multilevel_missing_data.frame
mdf <- missing_data.frame(
dat,
subclass = "multilevel",
groups = c("class", "school"
)
# Output:
> Error in getClass(Class, where = topenv(parent.frame())): “NA” is not a defined class
Problem 2. Defining the boundaries in the bounded-continuous class
According to the documentation
Objects can be created that are of
bounded-continuousclass via the themissing_variablegeneric function by specifyingtype = "bounded-continuous"as well aslowerand / orupper
This means that I should be able to define a variable as bounded-continuous the following way:
missing_variable(
dat$x,
type = "bounded-continuous",
lower = 0, upper = 5
)
However, I can not figure out how to add the defined variable to the missing_data.frame object together with all the other variables.
"working" example
(i.e. the code runs, but it doesn't impute the values correctly as the grouping variables are not defined, and boundaries for x are not set.):
library(dlpyr)
library(magrittr)
library(mi)
# loading and preparing data
url <- "https://simongrund1.github.io/posts/multiple-imputation-for-three-level-and-cross-classified-data_files/example_3l.Rdata"
download.file(url, basename(url))
load("example_3l.Rdata")
dat %<>%
mutate(
z = as.ordered(round(z)+2),
x = abs(x)
)
mdf <- missing_data.frame(dat) # Here I should define grouping variables somehow
mdf <- change(
mdf,
y = c("class", "school", "x"),
what = "type",
to = c("group", "group", "bounded-continuous")
) # Boundaries for "x" should be added here
mi_mdf <- mi(mdf, n.iter = 30, n.chains = 4, max.minutes = 20)
mi_mdf <- complete(mi_mdf, m = 1)
mi_mdf %>% select(class:w)
Output (incorrect):
- Variable "z" has varying values imputed for the same level 2 ("class") ID
- Variable "w" has varying values imputed for the same level 3 ("school") ID
class school x y z w
<dbl> <dbl> <dbl> <dbl> <ord> <dbl>
2 1 0.32023493 -1.024494036 3 1.329629
2 1 0.06949615 0.547773458 3 1.329629
2 1 1.98737694 0.287954055 1 1.329629
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
250 50 2.54218522 1.63412995 1 -1.52528136 #
250 50 2.11927441 0.60549683 1 -1.17877900
250 50 2.01830248 1.27016541 1 -1.74640219
How can I define class as level 2 and school as level 3 variables, and how can I set the boundaries of x to 0 (lower) and 10 (upper)?
Desired output:
- Variable "z" has the same imputed value for each level 2 ("class") ID
- Variable "w" has the same imputed values for each level 3 ("school") ID
- "x" seemingly do not have values outside the desired range. (My assumption is that when defining a variable as
bounded-continuous, boundaries are automatically set based on the range of the real values.) However, the boundaries should nonetheless be defined as the interval which the values can exist within.
class school x y z w
<dbl> <dbl> <dbl> <dbl> <ord> <dbl>
2 1 0.32023493 -1.024494036 3 1.329629
2 1 0.06949615 0.547773458 3 1.329629
2 1 1.98737694 0.287954055 3 1.329629
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
250 50 2.54218522 1.63412995 1 -1.52528136
250 50 2.11927441 0.60549683 1 -1.52528136
250 50 2.01830248 1.27016541 1 -1.52528136