CLAIFeb 8, 2025

ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports

arXiv:2502.05638v16 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This addresses healthcare interoperability needs by providing an efficient method for processing clinical data, though it is incremental as it applies existing fine-tuning techniques to a new domain.

The paper tackled extracting structured clinical information from unstructured reports using LLMs, finding that fine-tuned smaller models matched or surpassed larger ones in performance, with evaluations on a new dataset of 84,000 annotated clinical summaries.

Europe's healthcare systems require enhanced interoperability and digitalization, driving a demand for innovative solutions to process legacy clinical data. This paper presents the results of our project, which aims to leverage Large Language Models (LLMs) to extract structured information from unstructured clinical reports, focusing on patient history, diagnoses, treatments, and other predefined categories. We developed a workflow with a user interface and evaluated LLMs of varying sizes through prompting strategies and fine-tuning. Our results show that fine-tuned smaller models match or surpass larger counterparts in performance, offering efficiency for resource-limited settings. A new dataset of 60,000 annotated English clinical summaries and 24,000 German translations was validated with automated and manual checks. The evaluations used ROUGE, BERTScore, and entity-level metrics. The work highlights the approach's viability and outlines future improvements.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes