AIFeb 26

CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

arXiv:2602.23276v1h-index: 6
Originality Highly original
AI Analysis

This work is significant for clinicians and radiologists as it provides a more reliable and verifiable diagnostic reasoning tool for chest X-rays, addressing the limitations of existing LVLMs in safety-critical clinical settings.

This paper introduces CXReasonAgent, a diagnostic agent that integrates an LLM with clinically grounded diagnostic tools to perform evidence-grounded diagnostic reasoning using image-derived diagnostic and visual evidence for chest X-rays. It addresses the issue of large vision-language models generating ungrounded responses and requiring costly retraining by producing faithfully grounded responses, enabling more reliable and verifiable diagnostic reasoning than LVLMs.

Chest X-ray plays a central role in thoracic diagnosis, and its interpretation inherently requires multi-step, evidence-grounded reasoning. However, large vision-language models (LVLMs) often generate plausible responses that are not faithfully grounded in diagnostic evidence and provide limited visual evidence for verification, while also requiring costly retraining to support new diagnostic tasks, limiting their reliability and adaptability in clinical settings. To address these limitations, we present CXReasonAgent, a diagnostic agent that integrates a large language model (LLM) with clinically grounded diagnostic tools to perform evidence-grounded diagnostic reasoning using image-derived diagnostic and visual evidence. To evaluate these capabilities, we introduce CXReasonDial, a multi-turn dialogue benchmark with 1,946 dialogues across 12 diagnostic tasks, and show that CXReasonAgent produces faithfully grounded responses, enabling more reliable and verifiable diagnostic reasoning than LVLMs. These findings highlight the importance of integrating clinically grounded diagnostic tools, particularly in safety-critical clinical settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes