Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs
This work addresses the challenge of maintaining contextual integrity in multi-turn conversations for LLM users, representing a novel method for a known bottleneck rather than an incremental improvement.
The paper tackles the problem of cumulative contextual decay in multi-turn conversations with LLMs, where performance deteriorates due to attention pollution, dilution, and drift, and proposes the Rhea framework to address this. The result shows that Rhea improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain) and maintains high instruction fidelity (IAR > 8.1) across long-horizon interactions.
Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulative contextual decay - a progressive degradation of contextual integrity caused by attention pollution, dilution, and drift. To address this challenge, we propose Rhea (Role-aware Heuristic Episodic Attention), a novel framework that decouples conversation history into two functionally independent memory modules: (1) an Instructional Memory (IM) that persistently stores high-fidelity global constraints via a structural priority mechanism, and (2) an Episodic Memory (EM) that dynamically manages user-model interactions via asymmetric noise control and heuristic context retrieval. During inference, Rhea constructs a high signal-to-noise context by applying its priority attention: selectively integrating relevant episodic information while always prioritizing global instructions. To validate this approach, experiments on multiple multi-turn conversation benchmarks - including MT-Eval and Long-MT-Bench+ - show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain over strong baselines). Moreover, Rhea maintains near-perfect instruction fidelity (IAR > 8.1) across long-horizon interactions. These results demonstrate that Rhea provides a principled and effective framework for building more precise, instruction-consistent conversational LLMs.