Serialized EHR make for good text representations
This work addresses a clinically meaningful problem in antibiotic stewardship for healthcare by improving EHR representation learning, though it is incremental as it builds on existing foundation models like SciBERT.
The paper tackled the problem of learning generalizable representations from Electronic Health Records (EHRs) by addressing the structural mismatch between tabular EHR data and sequential natural language models, resulting in SerialBEHRT achieving superior and more consistent performance in antibiotic susceptibility prediction.
The emergence of foundation models in healthcare has opened new avenues for learning generalizable representations from large scale clinical data. Yet, existing approaches often struggle to reconcile the tabular and event based nature of Electronic Health Records (EHRs) with the sequential priors of natural language models. This structural mismatch limits their ability to capture longitudinal dependencies across patient encounters. We introduce SerialBEHRT, a domain aligned foundation model that extends SciBERT through additional pretraining on structured EHR sequences. SerialBEHRT is designed to encode temporal and contextual relationships among clinical events, thereby producing richer patient representations. We evaluate its effectiveness on the task of antibiotic susceptibility prediction, a clinically meaningful problem in antibiotic stewardship. Through extensive benchmarking against state of the art EHR representation strategies, we demonstrate that SerialBEHRT achieves superior and more consistent performance, highlighting the importance of temporal serialization in foundation model pretraining for healthcare.