CL AIJul 18, 2025

Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters

Shanbo Cheng, Yu Bao, Qian Cao, Luyang Huang, Liyan Kang, Zhicheng Liu, Yu Lu, Wenhao Zhu, Jingwen Chen, Zhichao Huang, Tao Li, Yifu Li

arXiv:2507.13618v421.817 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of high-quality multilingual translation for researchers and practitioners, offering an open-source alternative to closed models, though it appears incremental in method.

The authors tackled multilingual translation by introducing Seed-X, a family of open-source LLMs with 7B parameters, achieving performance comparable to leading closed-source models like Gemini-2.5 and GPT-4o across 28 languages and significantly outperforming larger open-source models in automatic metrics and human evaluations.

Multilingual translation stands as a challenging task for large language models (LLMs) to handle intricate language patterns and stilted translations that arise in automated translations. In this paper, we introduce Seed-X, a family of open-source LLMs comprising instruct and reasoning models, pushing the limits of translation capability with 7B parameter size. The base model is pre-trained on a diverse, high-quality dataset encompassing both monolingual and bilingual content across 28 languages, harnessing the full potential of multilingual data. The instruct model is then finetuned to translate by Chain-of-Thought (CoT) reasoning and further enhanced through reinforcement learning (RL) to achieve better generalization across diverse language pairs. Seed-X achieves performance comparable to leading closed-source models, including Gemini-2.5 and GPT-4o, across 28 languages, and significantly outperforms larger open-source models in both automatic metrics and human evaluations. We share the best practices through our optimization process, and make the parameter public available for advancing translation research and applications.

View on arXiv PDF

Similar