LG AI CLApr 11, 2024

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret

DeepMind

arXiv:2404.07839v229.749 citationsh-index: 89Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more memory-efficient and faster inference in language models, though it appears incremental as it builds on existing architectures like Gemma.

The authors tackled the problem of efficient language modeling by introducing RecurrentGemma, a family of open language models using the Griffin architecture, which combines linear recurrences with local attention to achieve comparable performance to similarly-sized Gemma models while being trained on fewer tokens.

We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

View on arXiv PDF Code

Similar