CLAIIRJun 22, 2025

Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines

arXiv:2506.21615v11 citationsh-index: 22IEEE journal of biomedical and health informatics
Originality Incremental advance
AI Analysis

This work addresses the problem of limited clinical utility in medical language models for healthcare practitioners by providing a scalable, low-cost, and hallucination-free method, though it is incremental as it builds on retrieval-augmented approaches.

The paper tackled the misalignment between ICD code-based diagnosis predictions and nuanced clinical reasoning by introducing GARMLE-G, a Generation-Augmented Retrieval framework that grounds medical language models in clinical practice guidelines, resulting in superior retrieval precision, semantic relevance, and guideline adherence compared to baselines for hypertension diagnosis.

Current medical language models, adapted from large language models (LLMs), typically predict ICD code-based diagnosis from electronic health records (EHRs) because these labels are readily available. However, ICD codes do not capture the nuanced, context-rich reasoning clinicians use for diagnosis. Clinicians synthesize diverse patient data and reference clinical practice guidelines (CPGs) to make evidence-based decisions. This misalignment limits the clinical utility of existing models. We introduce GARMLE-G, a Generation-Augmented Retrieval framework that grounds medical language model outputs in authoritative CPGs. Unlike conventional Retrieval-Augmented Generation based approaches, GARMLE-G enables hallucination-free outputs by directly retrieving authoritative guideline content without relying on model-generated text. It (1) integrates LLM predictions with EHR data to create semantically rich queries, (2) retrieves relevant CPG knowledge snippets via embedding similarity, and (3) fuses guideline content with model output to generate clinically aligned recommendations. A prototype system for hypertension diagnosis was developed and evaluated on multiple metrics, demonstrating superior retrieval precision, semantic relevance, and clinical guideline adherence compared to RAG-based baselines, while maintaining a lightweight architecture suitable for localized healthcare deployment. This work provides a scalable, low-cost, and hallucination-free method for grounding medical language models in evidence-based clinical practice, with strong potential for broader clinical deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes