DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System
This work demonstrates that training on large, multi-hospital health systems enables disease trajectory prediction models that generalize to real-world clinical settings, addressing the limitation of models trained on single-hospital or curated research cohorts.
DT-Transformer, a foundation model trained on 57.1M EHR entries from 1.7M patients across 11 hospitals, achieves a median AUC of 0.871 for next-event prediction across 896 disease categories, with all categories exceeding AUC 0.5.
Accurate disease trajectory prediction is critical for early intervention, resource allocation, and improving long-term outcomes. While electronic health records (EHRs) provide a rich longitudinal view of patient health in clinical environments, models trained on curated research cohorts may not reflect routine deployment settings, and those trained on single-hospital datasets capture only fragments of each patient's trajectory. This highlights the importance of leveraging large, multi-hospital health systems for training and validation to better reflect real-world clinical complexity. In this work, we develop DT-Transformer, a foundation model trained on 57.1M structured EHR entries over 1.7M patients from Mass General Brigham (MGB), spanning 11 hospitals and a broad network of outpatient clinics. DT-Transformer achieves strong discrimination in both held-out and prospective validation settings. Next-event prediction achieves a median age- and sex-stratified AUC of 0.871 across 896 disease categories, with all categories exceeding AUC 0.5. These results support health system-scale training as a path toward foundation models suited to real-world clinical forecasting.