On the Relation between Prediction and Imputation Accuracy under Missing Covariates
This work addresses the problem of handling missing data in regression for researchers and practitioners, but it is incremental as it builds on existing methods without introducing new paradigms.
The study investigates how imputation accuracy affects prediction accuracy in regression with missing covariates, using machine learning methods for both tasks, and evaluates imputation performance through statistical inference metrics like prediction interval coverage.
Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for imputation. It originates from their capability of showing favourable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine Learning based methods for both, imputation and prediction are used. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.