Bayesian generative models can flag performance loss, bias, and out-of-distribution image content
This work addresses the risk of bias and unreliability in medical imaging AI, particularly for dermatological images, by providing a scalable method to detect issues, though it is incremental as it builds on existing UQ and Laplace approximation techniques.
The authors tackled the problem of generative models being unreliable under distribution shifts by proposing SLUG, a new uncertainty quantification method for VAEs that effectively flags performance loss, bias, and out-of-distribution content in medical images, showing strong correlations with reconstruction error and racial underrepresentation bias.
Generative models are popular for medical imaging tasks such as anomaly detection, feature extraction, data visualization, or image generation. Since they are parameterized by deep learning models, they are often sensitive to distribution shifts and unreliable when applied to out-of-distribution data, creating a risk of, e.g. underrepresentation bias. This behavior can be flagged using uncertainty quantification methods for generative models, but their availability remains limited. We propose SLUG: A new UQ method for VAEs that combines recent advances in Laplace approximations with stochastic trace estimators to scale gracefully with image dimensionality. We show that our UQ score -- unlike the VAE's encoder variances -- correlates strongly with reconstruction error and racial underrepresentation bias for dermatological images. We also show how pixel-wise uncertainty can detect out-of-distribution image content such as ink, rulers, and patches, which is known to induce learning shortcuts in predictive models.