Contextual Drag: How Errors in the Context Affect LLM Reasoning

Yun Cheng, Xingyu Zhu, Haoyu Zhao, Sanjeev Arora

arXiv:2602.04288v11.12 citationsh-index: 8

Originality Incremental advance

AI Analysis

This identifies a persistent failure mode in LLM reasoning that affects self-improvement pipelines, making it incremental as it builds on existing assumptions about error reflection.

The paper studied contextual drag, where failed attempts in the context bias large language models toward similar errors, causing 10-20% performance drops across 11 models on 8 reasoning tasks and leading to self-deterioration in iterative self-refinement.

Central to many self-improvement pipelines for large language models (LLMs) is the assumption that models can improve by reflecting on past mistakes. We study a phenomenon termed contextual drag: the presence of failed attempts in the context biases subsequent generations toward structurally similar errors. Across evaluations of 11 proprietary and open-weight models on 8 reasoning tasks, contextual drag induces 10-20% performance drops, and iterative self-refinement in models with severe contextual drag can collapse into self-deterioration. Structural analysis using tree edit distance reveals that subsequent reasoning trajectories inherit structurally similar error patterns from the context. We demonstrate that neither external feedback nor successful self-verification suffices to eliminate this effect. While mitigation strategies such as fallback-behavior fine-tuning and context denoising yield partial improvements, they fail to fully restore baseline performance, positioning contextual drag as a persistent failure mode in current reasoning architectures.

View on arXiv PDF

Similar