CLAIMay 10

Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs

arXiv:2605.0926837.2
AI Analysis

For developers of conversational AI, this work identifies critical weaknesses in LLMs' multi-turn context handling, though the findings are incremental as they confirm known issues.

The paper investigates how LLMs handle topic shifts in multi-turn dialogue, finding that only some reasoning and strongly instructed models accurately detect pivots, while open-weight models struggle and all models exhibit position bias.

Users interacting with Large Language Models (LLMs) in a multi-turn conversation routinely refine their requests or pivot to new topics. LLMs, however, often miss these topic shifts and carry over irrelevant context from previous turns, leading to inaccurate responses. In this paper, we stress-test the multi-turn understanding of LLMs and study the following two sub-tasks: (1) detecting whether the user pivots or refines in the current turn, and (2) shortlisting relevant context from previous turns. To this end, we construct synthetic benchmarks based on real-world datasets from varied domains, as to simulate context shifts of different levels of difficulty. We then evaluate the zero-shot performance of ten LLMs (open-weight, closed-source and reasoning), and demonstrate that only some reasoning and strongly instructed LLMs are accurate in detecting pivots; open-weight LLMs struggle with the task and frequently carry stale context even with explicit cues; and all models suffer from a position bias. Based on the results, we discuss key takeaways for improving long-term robustness in multi-turn capabilities for LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes