LGFeb 11

Towards Compressive and Scalable Recurrent Memory

Yunchong Song, Jushi Kai, Liming Lu, Kaixi Qiu, Zhouhan Lin

arXiv:2602.11212v11.4h-index: 3

Originality Highly original

AI Analysis

This addresses the scalability problem for long-context AI models, offering a practical solution with significant performance gains, though it is incremental in improving existing memory approaches.

The paper tackled the quadratic attention bottleneck in Transformers for long contexts by introducing Elastic Memory, a recurrent memory architecture based on the HiPPO framework, which outperformed baselines like Memorizing Transformer by 16x memory efficiency and Melodi even with fewer parameters.

Transformers face a quadratic bottleneck in attention when scaling to long contexts. Recent approaches introduce recurrent memory to extend context beyond the current window, yet these often face a fundamental trade-off between theoretical principles and practical scalability. To address this, we introduce Elastic Memory, a novel memory architecture grounded in the HiPPO framework for online function approximation. Elastic Memory treats historical sequence as samples from continuous signals, applying optimal online compression to encode them into a fixed-size memory state. For retrieval, we propose a flexible \textit{polynomial sampling} mechanism that reconstructs a history summary from this compressed state. Elastic Memory consistently outperformed baselines on long-context (32k+) datasets across three domains. With equal parameters, it beat Memorizing Transformer by 16x memory and outperformed Melodi at all memory sizes, even when Melodi had 30% more parameters. When scaling model size, Elastic Memory stayed ahead of all baselines and was significantly faster than Melodi at 4x size. Furthermore, its decoupled design allows for injecting inductive biases at test-time to boost performance.

View on arXiv PDF

Similar