Time-to-Event Pretraining for 3D Medical Imaging
This addresses the problem of predicting disease risk from medical imaging for clinical applications, representing a domain-specific advancement.
The paper tackles the problem that current self-supervised methods for 3D medical imaging fail to link pixel biomarkers with long-term health outcomes due to missing temporal context, by introducing time-to-event pretraining that leverages longitudinal electronic health records. The result is improved outcome prediction with an average AUROC increase of 23.7% and a 29.4% gain in Harrell's C-index across 8 benchmark tasks.
With the rise of medical foundation models and the growing availability of imaging data, scalable pretraining techniques offer a promising way to identify imaging biomarkers predictive of future disease risk. While current self-supervised methods for 3D medical imaging models capture local structural features like organ morphology, they fail to link pixel biomarkers with long-term health outcomes due to a missing context problem. Current approaches lack the temporal context necessary to identify biomarkers correlated with disease progression, as they rely on supervision derived only from images and concurrent text descriptions. To address this, we introduce time-to-event pretraining, a pretraining framework for 3D medical imaging models that leverages large-scale temporal supervision from paired, longitudinal electronic health records (EHRs). Using a dataset of 18,945 CT scans (4.2 million 2D images) and time-to-event distributions across thousands of EHR-derived tasks, our method improves outcome prediction, achieving an average AUROC increase of 23.7% and a 29.4% gain in Harrell's C-index across 8 benchmark tasks. Importantly, these gains are achieved without sacrificing diagnostic classification performance. This study lays the foundation for integrating longitudinal EHR and 3D imaging data to advance clinical risk prediction.