CLDec 19, 2022

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

arXiv:2212.09651v4232 citationsh-index: 70
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving NLP task performance for low-resource languages, which is incremental as it builds on existing multilingual models with retrieval-based prompting.

The authors tackled the problem of low zero-shot performance on low-resource languages by proposing PARC, a pipeline that augments prompts with retrieved sentences from high-resource languages, resulting in performance improvements of +5.1% in unlabeled settings and +16.3% in labeled settings across 10 languages.

Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes