CLJun 18, 2025

CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

arXiv:2506.15118v12 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This work addresses efficiency and accuracy issues in clinical settings for disease prediction, though it is incremental as it builds on existing knowledge distillation techniques.

The study tackled the challenges of insufficient medical knowledge representation and low efficiency in clinical deployment for EHR-based disease prediction models by proposing the CKD-EHR framework, which uses knowledge distillation to achieve a 9% increase in diagnostic accuracy, 27% improvement in F1-score, and a 22.2 times inference speedup on the MIMIC-III dataset.

Electronic Health Records (EHR)-based disease prediction models have demonstrated significant clinical value in promoting precision medicine and enabling early intervention. However, existing large language models face two major challenges: insufficient representation of medical knowledge and low efficiency in clinical deployment. To address these challenges, this study proposes the CKD-EHR (Clinical Knowledge Distillation for EHR) framework, which achieves efficient and accurate disease risk prediction through knowledge distillation techniques. Specifically, the large language model Qwen2.5-7B is first fine-tuned on medical knowledge-enhanced data to serve as the teacher model.It then generates interpretable soft labels through a multi-granularity attention distillation mechanism. Finally, the distilled knowledge is transferred to a lightweight BERT student model. Experimental results show that on the MIMIC-III dataset, CKD-EHR significantly outperforms the baseline model:diagnostic accuracy is increased by 9%, F1-score is improved by 27%, and a 22.2 times inference speedup is achieved. This innovative solution not only greatly improves resource utilization efficiency but also significantly enhances the accuracy and timeliness of diagnosis, providing a practical technical approach for resource optimization in clinical settings. The code and data for this research are available athttps://github.com/209506702/CKD_EHR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes