Merge wide mids with long df, convert back to mids

74 Views Asked by At

I am trying to merge a wide-format mids object (as a result of multiple imputation with the mice package) with a long-format dataframe, which contains a time variable. Both dataframes contain the same IDs (id). However, I encounter an issue with rownames when trying to merge.

set.seed(123)
library(tidyverse)
library(mice)

wide <- data.frame(
  id = c(1, 2, 3, 4, 5),
  x = c(1.5, 3, NA, 4.2, 5.8),
  y = c(9.3, NA, 31.7, 41.1, 52.6),
  z = c(101, 198, 305, NA, 499)
)

long <- data.frame(
  id = rep(1:5, each = 2),
  time = rep(1:2, times = 5),
  a = c(10, 15, 20, 25, 50, 30, 35, 40, 30, 45),
  b = c(100, 150, 200, 250, 200, 300, 350, 400, 100, 450)
)

wide_mids <- mice(data = wide, 
                  m = 5, 
                  printFlag = FALSE)
#> Warning: Number of logged events: 2

completed_wide <- mice::complete(data = wide_mids,
                                 action = "long",
                                 include = TRUE)

merged <- merge(completed_wide, long, by = "id")
merged_mids <- as.mids(merged)
#> Warning: non-unique values when setting 'row.names': '1', '2', '3', '4', '5'
#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed

Trying different kinds of merging like full_join or left_join from dplyr still results in the same error message. Any help is appreciated.

1

There are 1 best solutions below

1
Mark On

The issue I believe is with the .id argument of as_mids():

.id
An optional column number or column name in long, indicating the subject identification. If not specified, then the function searches for a variable named ".id". If this variable is found, the values in the column will define the row names in the data element of the resulting mids object.

So as_mids() is using the .id column, which has 12 rows of 1s, 12 rows of 2s, etc.

One way of getting around this problem is to make a new id column and then use that:

# Reusing L Tyrone's code
long |>
  left_join(completed_wide,
            by = "id",
            relationship = "many-to-many") |>
  mutate(newid = row_number()) |>
  as.mids(.id = "newid")

Output:

Class: mids
Number of multiple imputations:  5 
Imputation methods:
   id  time     a     b   .id     x     y     z 
   ""    ""    ""    ""    "" "pmm" "pmm" "pmm" 
PredictorMatrix:
     id time a b .id x y z
id    0    1 1 1   1 1 1 1
time  1    0 1 1   1 1 1 1
a     1    1 0 1   1 1 1 1
b     1    1 1 0   1 1 1 1
.id   1    1 1 1   0 1 1 1
x     1    1 1 1   1 0 1 1