MAApr 9

MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought

Haodong Lei, Junming Liu, Yirong Chen, Ding Wang, Hongsong Wang

arXiv:2604.0821692.9

Predicted impact top 5% in MA · last 90 daysOriginality Highly original

AI Analysis

This addresses a critical bottleneck in LLMs for applications requiring reliable long-context causal reasoning, representing a novel method rather than an incremental improvement.

The paper tackles the problem of hallucinations and catastrophic forgetting in LLMs during long-context reasoning by proposing MemCoT, a test-time memory scaling framework that transforms reasoning into an iterative search, achieving state-of-the-art performance on benchmarks like LoCoMo and LongMemEval-S.

Large Language Models (LLMs) still suffer from severe hallucinations and catastrophic forgetting during causal reasoning over massive, fragmented long contexts. Existing memory mechanisms typically treat retrieval as a static, single-step passive matching process, leading to severe semantic dilution and contextual fragmentation. To overcome these fundamental bottlenecks, we propose MemCoT, a test-time memory scaling framework that redefines the reasoning process by transforming long-context reasoning into an iterative, stateful information search. MemCoT introduces a multi-view long-term memory perception module that enables Zoom-In evidence localization and Zoom-Out contextual expansion, allowing the model to first identify where relevant evidence resides and then reconstruct the surrounding causal structure necessary for reasoning. In addition, MemCoT employs a task-conditioned dual short-term memory system composed of semantic state memory and episodic trajectory memory. This short-term memory records historical search decisions and dynamically guides query decomposition and pruning across iterations. Empirical evaluations demonstrate that MemCoT establishes a state-of-the-art performance. Empowered by MemCoT, several open- and closed-source models achieve SOTA performance on the LoCoMo benchmark and LongMemEval-S benchmark.

View on arXiv PDF

Similar