RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
This work addresses the need for more memory-efficient and faster inference in language models, though it appears incremental as it builds on existing architectures like Gemma.
The authors tackled the problem of efficient language modeling by introducing RecurrentGemma, a family of open language models using the Griffin architecture, which combines linear recurrences with local attention to achieve comparable performance to similarly-sized Gemma models while being trained on fewer tokens.
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.