Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
This addresses the issue of language-level performance gaps in mPLMs for researchers and practitioners in multilingual NLP, representing an incremental improvement over previous fine-tuning methods.
The paper tackles the problem of performance disparities across languages in multilingual pretrained language models (mPLMs) by introducing ALSACE, which uses knowledge from well-performing languages to guide under-performing ones without requiring additional labeled multilingual data, resulting in effective mitigation of disparities and competitive performance on various multilingual NLU tasks.
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow these disparities by supervise fine-tuning the mPLMs with multilingual data. However, obtaining labeled multilingual data is time-consuming, and fine-tuning mPLM with limited labeled multilingual data merely encapsulates the knowledge specific to the labeled data. Therefore, we introduce ALSACE to leverage the learned knowledge from the well-performing languages to guide under-performing ones within the same mPLM, eliminating the need for additional labeled multilingual data. Experiments show that ALSACE effectively mitigates language-level performance disparity across various mPLMs while showing the competitive performance on different multilingual NLU tasks, ranging from full resource to limited resource settings. The code for our approach is available at https://github.com/pkunlp-icler/ALSACE.