LGAICLOct 17, 2024

An Evolved Universal Transformer Memory

arXiv:2410.13166v45 citationsh-index: 9ICLR
Originality Highly original
AI Analysis

This addresses the escalating costs of foundation models for AI researchers and practitioners by enabling more efficient and performant transformers across modalities, though it builds on prior memory management ideas.

The paper tackles the trade-off between performance and efficiency in transformers by introducing Neural Attention Memory Models (NAMMs), a learned network for memory management that evolves atop pre-trained models to focus on relevant information, achieving substantial performance improvements across long-context benchmarks while reducing input contexts to a fraction of original sizes.

Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs atop pre-trained transformers to provide different latent contexts focusing on the most relevant information for individual layers and attention heads. NAMMs are universally applicable to any model using self-attention as they condition exclusively on the values in the produced attention matrices. Learning NAMMs on a small set of problems, we achieve substantial performance improvements across multiple long-context benchmarks while cutting the model's input contexts up to a fraction of the original sizes. We show the generality of our conditioning enables zero-shot transfer of NAMMs trained only on language to entirely new transformer architectures even across input modalities, with their benefits carrying over to vision and reinforcement learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes