LGJul 1, 2025

Foundation Models for Clinical Records at Health System Scale

arXiv:2507.00574v13 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the underexplored potential of foundation models in healthcare for structured EHR data, though it appears incremental relative to existing pretraining paradigms.

The authors tackled the problem of applying large-scale pretraining to structured electronic health records by developing a novel generative pretraining strategy for sequential EHR data using next-visit event prediction. Their model achieved performance rivaling a fully fine-tuned baseline in zero-shot prediction for forecasting dementia and knee osteoarthritis incidence within 2 and 5 years.

Large-scale pretraining has transformed modeling of language and other data types, but its potential remains underexplored in healthcare with structured electronic health records (EHRs). We present a novel generative pretraining strategy for sequential EHR data using next-visit event prediction. Our model learns to autoregressively generate various tokenized clinical events for the next visit based on patient history and inherently handles the joint prediction of heterogeneous data types. Additionally, we introduce regularization on predicting repeated events and highlight a key pitfall in EHR-based foundation model evaluations: repeated event tokens can inflate performance metrics when new onsets are not distinguished from subsequent occurrences. Our model is evaluated via zero-shot prediction for forecasting dementia and knee osteoarthritis incidence within 2 and 5 years, and the model performance rivals a fully fine-tuned masked pretrained Transformer baseline, demonstrating that our approach captures complex clinical dependencies without requiring costly task-specific fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes