LGAICLApr 11, 2024

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

DeepMind
arXiv:2404.07839v249 citationsh-index: 89
Originality Incremental advance
AI Analysis

This work addresses the need for more memory-efficient and faster inference in language models, though it appears incremental as it builds on existing architectures like Gemma.

The authors tackled the problem of efficient language modeling by introducing RecurrentGemma, a family of open language models using the Griffin architecture, which combines linear recurrences with local attention to achieve comparable performance to similarly-sized Gemma models while being trained on fewer tokens.

We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes