MLAILGSep 20, 2018

Understanding Behavior of Clinical Models under Domain Shifts

arXiv:1809.07806v210 citations
Originality Incremental advance
AI Analysis

This addresses reliability issues in AI for healthcare when models are deployed in new environments, though it is incremental as it builds on existing validation mechanisms.

The paper tackles the problem of deep learning models failing to generalize in clinical settings due to domain shifts, by proposing an approach to emulate and evaluate these shifts using the MIMIC-III dataset, revealing data regimes where models can fail.

The hypothesis that computational models can be reliable enough to be adopted in prognosis and patient care is revolutionizing healthcare. Deep learning, in particular, has been a game changer in building predictive models, thus leading to community-wide data curation efforts. However, due to inherent variabilities in population characteristics and biological systems, these models are often biased to the training datasets. This can be limiting when models are deployed in new environments, when there are systematic domain shifts not known a priori. In this paper, we propose to emulate a large class of domain shifts, that can occur in clinical settings, with a given dataset, and argue that evaluating the behavior of predictive models in light of those shifts is an effective way to quantify their reliability. More specifically, we develop an approach for building realistic scenarios, based on analysis of \textit{disease landscapes} in multi-label classification. Using the openly available MIMIC-III EHR dataset for phenotyping, for the first time, our work sheds light into data regimes where deep clinical models can fail to generalize. This work emphasizes the need for novel validation mechanisms driven by real-world domain shifts in AI for healthcare.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes