CLOct 15, 2023

Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation

arXiv:2310.09917v3h-index: 15
Originality Synthesis-oriented
AI Analysis

This work addresses the understudied issue of cross-lingual generation for multilingual models, but it is incremental as it compares existing models and methods without introducing new paradigms.

The study tackled the problem of zero-shot cross-lingual knowledge transfer in generation tasks by testing multilingual pretrained language models like mBART and NLLB-200 with full and parameter-efficient finetuning, finding that mBART with adapters performs similarly to mT5 and NLLB-200 can be competitive in some cases, while tuning the learning rate helps reduce wrong-language generation.

Zero-shot cross-lingual knowledge transfer enables the multilingual pretrained language model (mPLM), finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, considering full finetuning and parameter-efficient finetuning with adapters. We find that mBART with adapters performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases. We also underline the importance of tuning learning rate used for finetuning, which helps to alleviate the problem of generation in the wrong language.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes