AICLLGApr 11, 2025

MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models

arXiv:2504.08329v32 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses a fundamental limitation in EHR foundation models for healthcare applications, though it is incremental as it builds on existing models and data standards.

The authors tackled the problem of out-of-vocabulary medical codes limiting the generalizability of electronic health record (EHR) foundation models by proposing MedRep, a set of medical concept representations based on the OMOP common data model, which outperformed baseline models in diverse prediction tasks and demonstrated external validation.

Electronic health record (EHR) foundation models have been an area ripe for exploration with their improved performance in various medical tasks. Despite the rapid advances, there exists a fundamental limitation: Processing unseen medical codes out of vocabulary. This problem limits the generalizability of EHR foundation models and the integration of models trained with different vocabularies. To alleviate this problem, we propose a set of novel medical concept representations (MedRep) for EHR foundation models based on the observational medical outcome partnership (OMOP) common data model (CDM). For concept representation learning, we enrich the information of each concept with a minimal definition through large language model (LLM) prompts and complement the text-based representations through the graph ontology of OMOP vocabulary. Our approach outperforms the vanilla EHR foundation model and the model with a previously introduced medical code tokenizer in diverse prediction tasks. We also demonstrate the generalizability of MedRep through external validation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes