CLSep 29, 2025

AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment

arXiv:2509.24338v16 citationsh-index: 21EMNLP
Originality Highly original
AI Analysis

This addresses the multilingual performance gap in LLMs for non-dominant languages, representing an incremental improvement through a novel method for a known bottleneck.

The paper tackles the problem of multilingual large language models (LLMs) underperforming for non-dominant languages by proposing AlignX, a two-stage framework that aligns multilingual representations and fine-tunes with instructions, resulting in enhanced multilingual general and cross-lingual generation capabilities as demonstrated in experiments.

Multilingual large language models (LLMs) possess impressive multilingual understanding and generation capabilities. However, their performance and cross-lingual alignment often lag for non-dominant languages. A common solution is to fine-tune LLMs on large-scale and more balanced multilingual corpus, but such approaches often lead to imprecise alignment and suboptimal knowledge transfer, struggling with limited improvements across languages. In this paper, we propose AlignX to bridge the multilingual performance gap, which is a two-stage representation-level framework for enhancing multilingual performance of pre-trained LLMs. In the first stage, we align multilingual representations with multilingual semantic alignment and language feature integration. In the second stage, we stimulate the multilingual capability of LLMs via multilingual instruction fine-tuning. Experimental results on several pre-trained LLMs demonstrate that our approach enhances LLMs' multilingual general and cross-lingual generation capability. Further analysis indicates that AlignX brings the multilingual representations closer and improves the cross-lingual alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes