CLAIMay 27, 2025

M-Wanda: Improving One-Shot Pruning for Multilingual LLMs

arXiv:2505.21171v12 citationsh-index: 13EMNLP
Originality Incremental advance
AI Analysis

This addresses the efficiency-performance trade-off for multilingual LLMs, though it appears incremental as it builds on existing pruning methods with language-specific optimizations.

The paper tackles the problem of performance loss in multilingual large language models when using one-shot pruning methods for efficiency, showing that moderate sparsity ratios substantially harm multilingual performance. They propose M-Wanda, a pruning method that incorporates language-aware activation statistics and dynamic layerwise sparsity adjustment, which consistently improves performance at minimal additional cost.

Multilingual LLM performance is often critically dependent on model size. With an eye on efficiency, this has led to a surge in interest in one-shot pruning methods that retain the benefits of large-scale pretraining while shrinking the model size. However, as pruning tends to come with performance loss, it is important to understand the trade-offs between multilinguality and sparsification. In this work, we study multilingual performance under different sparsity constraints and show that moderate ratios already substantially harm performance. To help bridge this gap, we propose M-Wanda, a pruning method that models cross-lingual variation by incorporating language-aware activation statistics into its pruning criterion and dynamically adjusts layerwise sparsity based on cross-lingual importance. We show that M-Wanda consistently improves performance at minimal additional costs. We are the first to explicitly optimize pruning to retain multilingual performance, and hope to inspire future advances in multilingual pruning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes