CLAIDec 1, 2025

TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness

arXiv:2512.01183v1Has Code
Originality Incremental advance
AI Analysis

It addresses robustness issues in RAG systems for AI practitioners, offering diagnostic tools and guidelines, but is incremental as it builds on existing evaluation methods.

This work investigates how text perturbations in retrieval interact with temperature settings in RAG systems, showing that high temperatures amplify vulnerability to perturbations and revealing distinct sensitivity patterns across experiments on HotpotQA.

The evaluation of Retrieval-Augmented Generation (RAG) systems typically examines retrieval quality and generation parameters like temperature in isolation, overlooking their interaction. This work presents a systematic investigation of how text perturbations (simulating noisy retrieval) interact with temperature settings across multiple LLM runs. We propose a comprehensive RAG Perturbation-Temperature Analysis Framework that subjects retrieved documents to three distinct perturbation types across varying temperature settings. Through extensive experiments on HotpotQA with both open-source and proprietary LLMs, we demonstrate that performance degradation follows distinct patterns: high-temperature settings consistently amplify vulnerability to perturbations, while certain perturbation types exhibit non-linear sensitivity across the temperature range. Our work yields three key contributions: (1) a diagnostic benchmark for assessing RAG robustness, (2) an analytical framework for quantifying perturbation-temperature interactions, and (3) practical guidelines for model selection and parameter tuning under noisy retrieval conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes