- If data are MCAR, imputation may not be not needed.
- If missingness is due to unmeasured variables related to the dependent variable, data are MNAR and should not be imputed.
- Imputation assumes data are MAR and should not be used with sparse data. Sparse data occur when missingness is non-random, such as a shopping cart survey of items purchased (coded 1) or not purchased (coded 0), because the null response (0) is non-random, due to unmeasured factors possibly not even known to the shopper.
- Imputation should not be used to impute all the data for a subject
- Imputation should not be used for a missing value for a given observation if that observation is also missing values on predictively critical variables in the imputation model. While this is difficult to check for each value to be imputed, a table of missing value patterns will show how many cases missing on a given variable also have missing values on other variables. In some cases this may lead a researcher to reject imputation.
- Imputation should not be used if over 50% of data are missing (some authors use lower cutoffs, such as 20%).
- Imputation is used with cross-sectional or historical data and is not appropriate for imputing future data in a time series.
- Use of imputation is suspect if it generates values outside valid ranges.
- Imputation based on a single pass is not acceptable due to the probabilistic nature of imputation. While as few as 3 – 5 imputations may suffice for reliability, today 20 – 100 or more imputations are usual.
Last active
July 23, 2024 16:17
-
-
Save farrajota/a733524a814596b0124d068a55221c29 to your computer and use it in GitHub Desktop.
Rules of thumb for when imputation of missing values should not be used.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment