AI CLNov 20, 2025

CARE-RAG - Clinical Assessment and Reasoning in RAG

Deepthi Potluri, Aby Mammen Mathew, Jeffrey B DeWitt, Alexander L. Rasgon, Yide Hao, Junyuan Hong, Ying Ding

arXiv:2511.15994v13.3h-index: 2

Originality Incremental advance

AI Analysis

This addresses the gap between retrieval and reasoning in clinical AI, which is critical for ensuring outputs align with structured protocols, though it is incremental in focusing on evaluation rather than a new method.

The paper tackles the problem that large language models (LLMs) often fail to reason correctly with retrieved evidence in clinical settings, using Written Exposure Therapy guidelines as a testbed, and finds persistent errors even with authoritative passages. It proposes an evaluation framework measuring accuracy, consistency, and fidelity of reasoning, highlighting that safe deployment requires assessing reasoning as rigorously as retrieval.

Access to the right evidence does not guarantee that large language models (LLMs) will reason with it correctly. This gap between retrieval and reasoning is especially concerning in clinical settings, where outputs must align with structured protocols. We study this gap using Written Exposure Therapy (WET) guidelines as a testbed. In evaluating model responses to curated clinician-vetted questions, we find that errors persist even when authoritative passages are provided. To address this, we propose an evaluation framework that measures accuracy, consistency, and fidelity of reasoning. Our results highlight both the potential and the risks: retrieval-augmented generation (RAG) can constrain outputs, but safe deployment requires assessing reasoning as rigorously as retrieval.

View on arXiv PDF

Similar