CLDec 19, 2022

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Ercong Nie, Sheng Liang, Helmut Schmid, Hinrich Schütze

arXiv:2212.09651v422.1232 citationsh-index: 70Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving NLP task performance for low-resource languages, which is incremental as it builds on existing multilingual models with retrieval-based prompting.

The authors tackled the problem of low zero-shot performance on low-resource languages by proposing PARC, a pipeline that augments prompts with retrieved sentences from high-resource languages, resulting in performance improvements of +5.1% in unlabeled settings and +16.3% in labeled settings across 10 languages.

Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

View on arXiv PDF Code

Similar