CLApr 12

How You Ask Matters! Adaptive RAG Robustness to Query Variations

arXiv:2604.1074581.1h-index: 8
Predicted impact top 67% in CL · last 90 daysOriginality Incremental advance
AI Analysis

It identifies a critical robustness gap in Adaptive RAG systems for practitioners deploying them in real-world applications with diverse user queries.

This paper introduces the first large-scale benchmark of semantically identical query variations to evaluate Adaptive RAG robustness, revealing that small surface-level changes dramatically alter retrieval behavior and accuracy, with larger models failing to improve robustness.

Adaptive Retrieval-Augmented Generation (RAG) promises accuracy and efficiency by dynamically triggering retrieval only when needed and is widely used in practice. However, real-world queries vary in surface form even with the same intent, and their impact on Adaptive RAG remains under-explored. We introduce the first large-scale benchmark of diverse yet semantically identical query variations, combining human-written and model-generated rewrites. Our benchmark facilitates a systematic evaluation of Adaptive RAG robustness by examining its key components across three dimensions: answer quality, computational cost, and retrieval decisions. We discover a critical robustness gap, where small surface-level changes in queries dramatically alter retrieval behavior and accuracy. Although larger models show better performance, robustness does not improve accordingly. These findings reveal that Adaptive RAG methods are highly vulnerable to query variations that preserve identical semantics, exposing a critical robustness challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes