CLFeb 4, 2025

Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study

CambridgeCMUIBMMeta AIOxford
arXiv:2502.02481v452 citationsh-index: 62Has CodeNAACL
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient and accessible multilingual translation for users needing practical, open-source solutions, though it is incremental as it builds on existing LLM capabilities.

The paper tackles multilingual machine translation using open large language models under 10 billion parameters, finding that models like Gemma2-9B show strong capabilities and introducing a data mixing strategy to create GemmaX2-28, which outperforms SOTA models and matches Google Translate and GPT-4-turbo across 28 languages.

Large language models (LLMs) have shown continuously improving multilingual capabilities, and even small-scale open-source models have demonstrated rapid performance enhancement. In this paper, we systematically explore the abilities of open LLMs with less than ten billion parameters to handle multilingual machine translation (MT) tasks. We conduct comprehensive evaluations on six popular LLMs and find that models like Gemma2-9B exhibit impressive multilingual translation capabilities. We then introduce the Parallel-First Monolingual-Second (PFMS) data mixing strategy in the continual pretraining stage to further enhance the MT performance and present GemmaX2-28, a 9B model achieving top-tier multilingual translation performance across 28 languages. Specifically, GemmaX2-28 consistently outperforms the state-of-the-art (SOTA) models such as TowerInstruct and XALMA and achieves competitive performance with Google Translate and GPT-4-turbo.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes