Trust Issues: Uncertainty Estimation Does Not Enable Reliable OOD Detection On Medical Tabular Data
This work addresses the critical problem of reliable OOD detection for deploying machine learning models in high-stakes healthcare settings, highlighting a gap in existing literature.
The study evaluated whether contemporary uncertainty estimation techniques can reliably detect out-of-distribution (OOD) patients on real-world medical tabular data, finding that almost all techniques failed to achieve convincing results.
When deploying machine learning models in high-stakes real-world environments such as health care, it is crucial to accurately assess the uncertainty concerning a model's prediction on abnormal inputs. However, there is a scarcity of literature analyzing this problem on medical data, especially on mixed-type tabular data such as Electronic Health Records. We close this gap by presenting a series of tests including a large variety of contemporary uncertainty estimation techniques, in order to determine whether they are able to identify out-of-distribution (OOD) patients. In contrast to previous work, we design tests on realistic and clinically relevant OOD groups, and run experiments on real-world medical data. We find that almost all techniques fail to achieve convincing results, partly disagreeing with earlier findings.