DIMP Package

R
Multiple Imputation
A utility to delete imputed outcome values after multiple imputation.
Published

November 30, 2024

1. Introduction

Deletion after IMPutation (DIMP) is an R package that serves as a wrapper around the mice package. It performs multiple imputation on a dataset, then removes the imputed values for a specified outcome variable. This approach allows imputed covariates to be used in downstream analyses while avoiding analysis on imputed outcomes, which can introduce bias when the outcome is not missing at random.

2. Demonstration

Suppose we have the following original dataset which has missing values in columns X1 and X3.

Y X1 X2 X3
1 2 5 NA
0 NA 4 2
NA 4 2 4

After calling ‘dimp()’,

Y X1 X2 X3
1 2 5 2
0 3 4 2
NA 4 2 4
Y X1 X2 X3
1 2 5 3
0 2 4 2
NA 4 2 4
Y X1 X2 X3
1 2 5 2
0 2 4 2
NA 4 2 4

3. Installation and Example

To desmonstrate usage, we use the airquality dataset. We treat Ozone as the response variable.

library(mice)
library(dimp)

exampledata <- airquality

original_missing <- colSums(is.na(exampledata))

mice_obj <- mice(exampledata, m = 5, maxit = 5, seed = 123)

 iter imp variable
  1   1  Ozone  Solar.R
  1   2  Ozone  Solar.R
  1   3  Ozone  Solar.R
  1   4  Ozone  Solar.R
  1   5  Ozone  Solar.R
  2   1  Ozone  Solar.R
  2   2  Ozone  Solar.R
  2   3  Ozone  Solar.R
  2   4  Ozone  Solar.R
  2   5  Ozone  Solar.R
  3   1  Ozone  Solar.R
  3   2  Ozone  Solar.R
  3   3  Ozone  Solar.R
  3   4  Ozone  Solar.R
  3   5  Ozone  Solar.R
  4   1  Ozone  Solar.R
  4   2  Ozone  Solar.R
  4   3  Ozone  Solar.R
  4   4  Ozone  Solar.R
  4   5  Ozone  Solar.R
  5   1  Ozone  Solar.R
  5   2  Ozone  Solar.R
  5   3  Ozone  Solar.R
  5   4  Ozone  Solar.R
  5   5  Ozone  Solar.R
# Check missing values after imputation (using completed dataset #1)
after_imputation <- colSums(is.na(complete(mice_obj, 1)))

mice_obj_deleted <- dimp(mice_obj, "Ozone")

# Check missing values after deletion of outcome
after_deletion <- colSums(is.na(complete(mice_obj_deleted, 1)))

missing_summary <- data.frame(
  Variable = names(original_missing),
  Original_NAs = original_missing,
  After_Imputation = after_imputation,
  After_dimp_Deletion = after_deletion
)
print(missing_summary)
         Variable Original_NAs After_Imputation After_dimp_Deletion
rownames rownames            0                0                   0
Ozone       Ozone           37                0                  37
Solar.R   Solar.R            7                0                   0
Wind         Wind            0                0                   0
Temp         Temp            0                0                   0
Month       Month            0                0                   0
Day           Day            0                0                   0
Note

This package was created by Mark Asuncion, Ting Lin, John Fei, Luke Bai and Jason Dang as part of the group project for CHL 8010 F2: Statistical Programming and Computation for Health Data (Instructor: Dr. Aya Mitani) offered by the Dalla Lana School of Public Health at the University of Toronto (Fall 2024).