ML LGSep 30, 2022

Leveraging variational autoencoders for multiple data imputation

Breeshey Roskams-Hieter, Jude Wells, Sara Wade

arXiv:2209.15321v16.714 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of unreliable imputation in data analysis for applications with missing data, but it is incremental as it builds on existing VAE methods.

The paper tackled the problem of poor uncertainty calibration in variational autoencoders (VAEs) for multiple data imputation, showing that VAEs provide poor empirical coverage with underestimation and overconfident imputations, and proposed using β-VAEs with cross-validation to improve calibration and avoid false discoveries in downstream tasks.

Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ $β$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of $β$ is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In downstream tasks, we show how multiple imputation with $β$-VAEs can avoid false discoveries that arise as artefacts of imputation.

View on arXiv PDF Code

Similar