CLDec 12, 2025

Building Patient Journeys in Hebrew: A Language Model for Clinical Timeline Extraction

arXiv:2512.11502v1h-index: 46
Originality Incremental advance
AI Analysis

This work addresses the need for patient journey construction in Hebrew clinical settings, which is incremental as it adapts existing methods to a specific language and domain.

The researchers tackled the problem of extracting structured clinical timelines from Hebrew electronic health records by developing a new language model based on DictaBERT 2.0, pre-trained on over five million de-identified records, and introduced two new annotated datasets for evaluation, achieving strong performance on both.

We present a new Hebrew medical language model designed to extract structured clinical timelines from electronic health records, enabling the construction of patient journeys. Our model is based on DictaBERT 2.0 and continually pre-trained on over five million de-identified hospital records. To evaluate its effectiveness, we introduce two new datasets -- one from internal medicine and emergency departments, and another from oncology -- annotated for event temporal relations. Our results show that our model achieves strong performance on both datasets. We also find that vocabulary adaptation improves token efficiency and that de-identification does not compromise downstream performance, supporting privacy-conscious model development. The model is made available for research use under ethical restrictions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes