CLAIMay 25

Language Models Need Sleep

arXiv:2605.2609998.1
Predicted impact top 3% in CL · last 90 daysOriginality Highly original
AI Analysis

This work addresses the context length scaling problem in large language models for long-horizon reasoning tasks, offering a biologically-inspired solution that improves performance on tasks requiring deeper reasoning.

The paper proposes a sleep-like consolidation mechanism for transformer-based models that periodically converts recent context into persistent fast weights, improving performance on long-horizon tasks. On synthetic tasks and math reasoning, models with sleep outperform regular transformers and SSM-attention hybrids, with performance gains scaling with sleep duration.

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes