CLAIMay 20, 2025

FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

arXiv:2505.14256v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses machine translation challenges for Chinese and 65 other languages, particularly in low-resource settings, representing an incremental improvement through model sparsification and curriculum learning.

The paper tackles multilingual machine translation with a focus on Chinese by developing FuxiMT, a sparsified large language model, which significantly outperforms state-of-the-art baselines, especially in low-resource scenarios, and shows strong zero-shot translation capabilities for unseen language pairs.

In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes