LGMar 7

Learning Clinical Representations Under Systematic Distribution Shift

arXiv:2603.07348v12 citations

Predicted impact top 36% in LG · last 90 daysOriginality Highly original

AI Analysis

This work is significant for clinical machine learning practitioners and researchers, as it provides a method to create more robust and transferable models in healthcare AI by explicitly accounting for systematic distribution shifts.

This paper addresses the problem of systematic distribution shifts in clinical machine learning models, which arise from heterogeneous measurement policies and institutional workflows. The authors propose a practice-invariant representation learning framework that improves out-of-distribution AUROC by up to 2 to 3 points compared to existing baselines, while maintaining in-distribution performance and improving calibration.

Clinical machine learning models are increasingly trained using large scale, multimodal foundation paradigms, yet deployment environments often differ systematically from the data generating settings used during training. Such shifts arise from heterogeneous measurement policies, documentation practices, and institutional workflows, leading to representation entanglement between physiologic signal and practice specific artifacts. In this work, we propose a practice invariant representation learning framework for multimodal clinical prediction. We model clinical observations as arising from latent physiologic factors and environment dependent processes, and introduce an objective that jointly optimizes predictive performance while suppressing environment predictive information in the learned embedding. Concretely, we combine supervised risk minimization with adversarial environment regularization and invariant risk penalties across hospitals. Across multiple longitudinal EHR prediction tasks and cross institution evaluations, our method improves out of distribution AUROC by up to 2 to 3 points relative to masked pretraining and standard supervised baselines, while maintaining in distribution performance and improving calibration. These results demonstrate that explicitly accounting for systematic distribution shift during representation learning yields more robust and transferable clinical models, highlighting the importance of structural invariance alongside architectural scale in healthcare AI.

View on arXiv PDF

Similar