CL AI IRJun 22, 2025

Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines

Wenhao Li, Hongkuan Zhang, Hongwei Zhang, Zhengxu Li, Zengjie Dong, Yafan Chen, Niranjan Bidargaddi, Hong Liu

arXiv:2506.21615v14.91 citationsh-index: 22IEEE journal of biomedical and health informatics

Originality Incremental advance

AI Analysis

This work addresses the problem of limited clinical utility in medical language models for healthcare practitioners by providing a scalable, low-cost, and hallucination-free method, though it is incremental as it builds on retrieval-augmented approaches.

The paper tackled the misalignment between ICD code-based diagnosis predictions and nuanced clinical reasoning by introducing GARMLE-G, a Generation-Augmented Retrieval framework that grounds medical language models in clinical practice guidelines, resulting in superior retrieval precision, semantic relevance, and guideline adherence compared to baselines for hypertension diagnosis.

Current medical language models, adapted from large language models (LLMs), typically predict ICD code-based diagnosis from electronic health records (EHRs) because these labels are readily available. However, ICD codes do not capture the nuanced, context-rich reasoning clinicians use for diagnosis. Clinicians synthesize diverse patient data and reference clinical practice guidelines (CPGs) to make evidence-based decisions. This misalignment limits the clinical utility of existing models. We introduce GARMLE-G, a Generation-Augmented Retrieval framework that grounds medical language model outputs in authoritative CPGs. Unlike conventional Retrieval-Augmented Generation based approaches, GARMLE-G enables hallucination-free outputs by directly retrieving authoritative guideline content without relying on model-generated text. It (1) integrates LLM predictions with EHR data to create semantically rich queries, (2) retrieves relevant CPG knowledge snippets via embedding similarity, and (3) fuses guideline content with model output to generate clinically aligned recommendations. A prototype system for hypertension diagnosis was developed and evaluated on multiple metrics, demonstrating superior retrieval precision, semantic relevance, and clinical guideline adherence compared to RAG-based baselines, while maintaining a lightweight architecture suitable for localized healthcare deployment. This work provides a scalable, low-cost, and hallucination-free method for grounding medical language models in evidence-based clinical practice, with strong potential for broader clinical deployment.

View on arXiv PDF

Similar