CL AIDec 12, 2024

Memory Layers at Scale

Vincent-Pierre Berges, Barlas Oğuz, Daniel Haziza, Wen-tau Yih, Luke Zettlemoyer, Gargi Ghosh

Meta AI

arXiv:2412.09764v214.423 citationsh-index: 83Has Code

Originality Incremental advance

AI Analysis

This work provides a scalable solution for enhancing language model efficiency and performance, especially for factual tasks, though it builds incrementally on existing memory layer concepts.

The paper tackles the problem of scaling memory layers in language models, demonstrating that models augmented with their improved memory layer outperform dense models with over twice the computation budget and match mixture-of-expert models in compute and parameters, with gains particularly strong on factual tasks.

Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply. This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale. On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the computation budget, as well as mixture-of-expert models when matched for both compute and parameters. We find gains are especially pronounced for factual tasks. We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters.

View on arXiv PDF Code

Similar