CLMay 23, 2024

Aya 23: Open Weight Releases to Further Multilingual Progress

arXiv:2405.15032v2135 citationsh-index: 56
Originality Synthesis-oriented
AI Analysis

This work addresses the need for high-performance multilingual AI models, benefiting approximately half of the world's population, though it is incremental as it builds on prior releases.

The paper introduces Aya 23, a family of multilingual language models that serve 23 languages, outperforming previous models like Aya 101 and others such as Gemma, Mistral, and Mixtral on various tasks.

This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes