CLApr 6, 2022

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency

arXiv:2204.02601v1639 citationsh-index: 48
Originality Synthesis-oriented
AI Analysis

It addresses the gap in structured pruning for multilingual models, which is incremental as it extends existing monolingual pruning methods to a new context.

This work investigates structured pruning on multilingual pre-trained language models, focusing on settings, algorithms, and efficiency, and finds counter-intuitive results such as individual language pruning not improving performance and simple methods outperforming others, with experiments on nine downstream tasks showing these phenomena.

Structured pruning has been extensively studied on monolingual pre-trained language models and is yet to be fully evaluated on their multilingual counterparts. This work investigates three aspects of structured pruning on multilingual pre-trained language models: settings, algorithms, and efficiency. Experiments on nine downstream tasks show several counter-intuitive phenomena: for settings, individually pruning for each language does not induce a better result; for algorithms, the simplest method performs the best; for efficiency, a fast model does not imply that it is also small. To facilitate the comparison on all sparsity levels, we present Dynamic Sparsification, a simple approach that allows training the model once and adapting to different model sizes at inference. We hope this work fills the gap in the study of structured pruning on multilingual pre-trained models and sheds light on future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes