CLMar 31, 2025

CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment

arXiv:2503.23777v21 citationsh-index: 44
Originality Incremental advance
AI Analysis

This addresses a specific issue in multilingual training for LLMs, offering an incremental improvement to enhance alignment efficiency across languages.

The paper tackles the problem of negative interference in multilingual preference alignment for large language models by proposing CONGRAD, a filtering method that selects high-quality samples with minimal gradient conflicts, resulting in consistent performance improvements across 10 languages with minimal alignment tax.

Naive joint training of large language models (LLMs) for multilingual preference alignment can suffer from negative interference. This is a known issue in multilingual training, where conflicting objectives degrade overall performance. However, the impact of this phenomenon in the context of multilingual preference alignment remains largely underexplored. To address this issue, we propose CONGRAD, a scalable and effective filtering method that selects high-quality preference samples with minimal gradient conflicts across languages. Our method leverages gradient surgery to retain samples aligned with an aggregated multilingual update direction. Additionally, we incorporate a sublinear gradient compression strategy that reduces memory overhead during gradient accumulation. We integrate CONGRAD into self-rewarding framework and evaluate on LLaMA3-8B and Gemma2-2B across 10 languages. Results show that CONGRAD consistently outperforms strong baselines in both seen and unseen languages, with minimal alignment tax.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes