Intriguing Properties of Compression on Multilingual Models
This addresses the trade-offs between model scale, multilingualism, and compression for deploying models under resource constraints, though it is incremental in exploring specific sparsification regimes.
The study investigated how compression affects multilingual models, finding that sparsification during fine-tuning can improve robustness and sometimes benefit low-resource languages, with experiments on mBERT across 40 languages showing these effects.
Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingualism, and compression. In this work, we propose an experimental framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning. Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties. In contrast to prior findings, we find that compression may improve model robustness over dense models. We additionally observe that under certain sparsification regimes compression may aid, rather than disproportionately impact the performance of low-resource languages.