LGFeb 11

Towards Compressive and Scalable Recurrent Memory

arXiv:2602.11212v1
Originality Highly original
AI Analysis

This addresses the scalability problem for long-context AI models, offering a practical solution with significant performance gains, though it is incremental in improving existing memory approaches.

The paper tackled the quadratic attention bottleneck in Transformers for long contexts by introducing Elastic Memory, a recurrent memory architecture based on the HiPPO framework, which outperformed baselines like Memorizing Transformer by 16x memory efficiency and Melodi even with fewer parameters.

Transformers face a quadratic bottleneck in attention when scaling to long contexts. Recent approaches introduce recurrent memory to extend context beyond the current window, yet these often face a fundamental trade-off between theoretical principles and practical scalability. To address this, we introduce Elastic Memory, a novel memory architecture grounded in the HiPPO framework for online function approximation. Elastic Memory treats historical sequence as samples from continuous signals, applying optimal online compression to encode them into a fixed-size memory state. For retrieval, we propose a flexible \textit{polynomial sampling} mechanism that reconstructs a history summary from this compressed state. Elastic Memory consistently outperformed baselines on long-context (32k+) datasets across three domains. With equal parameters, it beat Memorizing Transformer by 16x memory and outperformed Melodi at all memory sizes, even when Melodi had 30% more parameters. When scaling model size, Elastic Memory stayed ahead of all baselines and was significantly faster than Melodi at 4x size. Furthermore, its decoupled design allows for injecting inductive biases at test-time to boost performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes