CLAINCJun 9, 2025

Unable to Forget: Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length

arXiv:2506.08184v312 citationsh-index: 1
Originality Incremental advance
AI Analysis

This reveals a fundamental constraint on LLMs' ability to handle interference, indicating a working memory bottleneck beyond context access, which is significant for researchers and developers aiming to improve LLM retrieval capabilities, though it is incremental in building on existing cognitive science insights.

The study tackled the problem of intra-context interference in Large Language Models (LLMs) by adapting the proactive interference paradigm from cognitive science, finding that retrieval accuracy declines log-linearly toward zero as interference accumulates, with errors arising from retrieving previously overwritten values.

Information retrieval in Large Language Models (LLMs) is increasingly recognized as intertwined with generation capabilities rather than mere lookup. While longer contexts are often assumed to improve retrieval, the effects of intra-context interference remain understudied. To address this, we adapt the proactive interference (PI) paradigm from cognitive science, where earlier information disrupts recall of newer updates. In humans, susceptibility to such interference is inversely linked to working memory capacity. We introduce PI-LLM, an evaluation that sequentially streams semantically related key-value updates and queries only the final values. Although these final values are clearly positioned just before the query, LLM retrieval accuracy declines log-linearly toward zero as interference accumulates; errors arise from retrieving previously overwritten values. Attempts to mitigate interference via prompt engineering (e.g., instructing models to ignore earlier input) yield limited success. These findings reveal a fundamental constraint on LLMs' ability to disentangle interference and flexibly manipulate information, suggesting a working memory bottleneck beyond mere context access. This calls for approaches that strengthen models' ability to suppress irrelevant content during retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes