A utility to delete imputed outcome values after multiple imputation.
Published
November 30, 2024
1. Introduction
Deletion after IMPutation (DIMP) is an R package that serves as a wrapper around the mice package. It performs multiple imputation on a dataset, then removes the imputed values for a specified outcome variable. This approach allows imputed covariates to be used in downstream analyses while avoiding analysis on imputed outcomes, which can introduce bias when the outcome is not missing at random.
2. Demonstration
Suppose we have the following original dataset which has missing values in columns X1 and X3.
Y
X1
X2
X3
1
2
5
NA
0
NA
4
2
NA
4
2
4
After calling ‘dimp()’,
Y
X1
X2
X3
1
2
5
2
0
3
4
2
NA
4
2
4
Y
X1
X2
X3
1
2
5
3
0
2
4
2
NA
4
2
4
Y
X1
X2
X3
1
2
5
2
0
2
4
2
NA
4
2
4
3. Installation and Example
To desmonstrate usage, we use the airquality dataset. We treat Ozone as the response variable.
This package was created by Mark Asuncion, Ting Lin, John Fei, Luke Bai and Jason Dang as part of the group project for CHL 8010 F2: Statistical Programming and Computation for Health Data (Instructor: Dr. Aya Mitani) offered by the Dalla Lana School of Public Health at the University of Toronto (Fall 2024).