CL AINov 4, 2022

Intriguing Properties of Compression on Multilingual Models

Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Sebastian Gehrmann, Sara Hooker, Julia Kreutzer

DeepMind

arXiv:2211.02738v224.4298 citationsh-index: 35

Originality Incremental advance

AI Analysis

This addresses the trade-offs between model scale, multilingualism, and compression for deploying models under resource constraints, though it is incremental in exploring specific sparsification regimes.

The study investigated how compression affects multilingual models, finding that sparsification during fine-tuning can improve robustness and sometimes benefit low-resource languages, with experiments on mBERT across 40 languages showing these effects.

Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingualism, and compression. In this work, we propose an experimental framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning. Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties. In contrast to prior findings, we find that compression may improve model robustness over dense models. We additionally observe that under certain sparsification regimes compression may aid, rather than disproportionately impact the performance of low-resource languages.

View on arXiv PDF

Similar