mice imputation fails when using 'where'

39 Views Asked by At

I'm using R package mice for multiple imputation. I only need to impute a subset of my variables, but I thought it could be good to use the full data set for the imputation of these variables. Hence I'd like to use the where flag to specify which values (and variables) to impute. However, if I set where, then many, or sometimes all, of the values indicated by the where input matrix remain NA, whereas if I don't set where, then all missing values are imputed. I don't know why that would be.

Using R 4.2.3, mice 3.16.0. Generate random data with missing values

library(mice)
set.seed(2024)
df <- matrix(data = sample(100,100,replace=TRUE), ncol = 5)
df[df>80] = NA
> df
      [,1] [,2] [,3] [,4] [,5]
 [1,]   66    1   80   43   78
 [2,]   37   75   NA   NA   16
 [3,]   45   35   67   58   36
 [4,]   60   NA   25   49   46
 [5,]   17   28   34    5   NA
 [6,]   32   48   NA   80   72
 [7,]   NA   NA   57    6   73
 [8,]   29   NA   49    3   20
 [9,]   11   NA   54   25   18
[10,]   16   NA   NA   33   34
[11,]   29   43   49   75   19
[12,]   62   58    3   NA   12
[13,]   14   20   25   30   NA
[14,]   34   55   65   NA   66
[15,]   26   NA    6   73   16
[16,]   44   60   20   61   45
[17,]   50    4    9    3   62
[18,]   26    7   79   29   70
[19,]   32   52   77   20   68
[20,]   NA   57   80   61   31

where matrix specifies that I'm only interested in imputing the second variable

where.mat = matrix(FALSE,nrow = dim(df)[1],ncol = dim(df)[2])
where.mat[,2] = is.na(df[,2])
> where.mat
       [,1]  [,2]  [,3]  [,4]  [,5]
 [1,] FALSE FALSE FALSE FALSE FALSE
 [2,] FALSE FALSE FALSE FALSE FALSE
 [3,] FALSE FALSE FALSE FALSE FALSE
 [4,] FALSE  TRUE FALSE FALSE FALSE
 [5,] FALSE FALSE FALSE FALSE FALSE
 [6,] FALSE FALSE FALSE FALSE FALSE
 [7,] FALSE  TRUE FALSE FALSE FALSE
 [8,] FALSE  TRUE FALSE FALSE FALSE
 [9,] FALSE  TRUE FALSE FALSE FALSE
[10,] FALSE  TRUE FALSE FALSE FALSE
[11,] FALSE FALSE FALSE FALSE FALSE
[12,] FALSE FALSE FALSE FALSE FALSE
[13,] FALSE FALSE FALSE FALSE FALSE
[14,] FALSE FALSE FALSE FALSE FALSE
[15,] FALSE  TRUE FALSE FALSE FALSE
[16,] FALSE FALSE FALSE FALSE FALSE
[17,] FALSE FALSE FALSE FALSE FALSE
[18,] FALSE FALSE FALSE FALSE FALSE
[19,] FALSE FALSE FALSE FALSE FALSE
[20,] FALSE FALSE FALSE FALSE FALSE

Imputation using the where matrix

test1 = mice(data = df, 
             m = 1, 
             ignore = NULL,
             maxit = 1, 
             where = where.mat,
             method = "pmm")
imp_dat <- complete(test1, action = 1, include = F)

Not all values are imputed

> sum(is.na(imp_dat[,2]))
[1] 2
> imp_dat[,2]
 [1]  1 75 35 43 28 48 NA 43 60 NA 43 58 20 55 43 60  4  7 52 57
> 

Run again, now without the where matrix

test2 = mice(data = df, 
             m = 1, 
             ignore = NULL,
             maxit = 1, 
             #where = where.mat,
             method = "pmm")
imp_dat2 <- complete(test2, action = 1, include = F)

Now everything is imputed

> sum(is.na(imp_dat2[,2]))
[1] 0
> imp_dat2[,2]
 [1]  1 75 35  7 28 48  7 43 43 43 43 58 20 55 20 60  4  7 52 57
> 

Why doesn't it work with the where matrix? I've confirmed that the predictorMatrix is correct, and it isn't a collinearity issue, since it works in the second example.

> test1$predictorMatrix
   V1 V2 V3 V4 V5
V1  0  1  1  1  1
V2  1  0  1  1  1
V3  1  1  0  1  1
V4  1  1  1  0  1
V5  1  1  1  1  0
0

There are 0 best solutions below