Mixture-based Multiple Imputation Model for Clinical Data with a Temporal Dimension
This work addresses missing data in clinical time series, an incremental improvement for healthcare data mining applications.
The authors tackled missing values in clinical multivariable time series by proposing a multiple imputation model that integrates Gaussian processes with mixture models and individualized mixing weights, achieving more accurate imputation than benchmarks on all tested datasets.
The problem of missing values in multivariable time series is a key challenge in many applications such as clinical data mining. Although many imputation methods show their effectiveness in many applications, few of them are designed to accommodate clinical multivariable time series. In this work, we propose a multiple imputation model that capture both cross-sectional information and temporal correlations. We integrate Gaussian processes with mixture models and introduce individualized mixing weights to handle the variance of predictive confidence of Gaussian process models. The proposed model is compared with several state-of-the-art imputation algorithms on both real-world and synthetic datasets. Experiments show that our best model can provide more accurate imputation than the benchmarks on all of our datasets.