Gradient Importance Learning for Incomplete Observations
This addresses the issue of error propagation in prediction models for real-world applications with high missingness rates or small sample sizes, offering a novel approach to handling incomplete data.
The paper tackles the problem of poor downstream task performance due to imputation errors in datasets with missing values by introducing gradient importance learning (GIL), which trains models like MLPs and LSTMs to directly infer from incomplete inputs without imputation, outperforming traditional imputation-based methods on real-world datasets such as MIMIC-III and MNIST.
Though recent works have developed methods that can generate estimates (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks such as classification. This is particularly true if the data have large missingness rates or a small sample size. More importantly, the imputation error could be propagated into the prediction step that follows, which may constrain the capabilities of the prediction model. In this work, we introduce the gradient importance learning (GIL) method to train multilayer perceptrons (MLPs) and long short-term memories (LSTMs) to directly perform inference from inputs containing missing values without imputation. Specifically, we employ reinforcement learning (RL) to adjust the gradients used to train these models via back-propagation. This allows the model to exploit the underlying information behind missingness patterns. We test the approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.