CLOct 7, 2023

Zero-shot Cross-lingual Transfer without Parallel Corpus

arXiv:2310.04726v11 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses the data scarcity issue in multilingual NLP for low-resource languages, offering a more accessible solution compared to previous methods.

The paper tackles the problem of cross-lingual transfer for low-resource languages without relying on parallel corpora or translation models, achieving new state-of-the-art results on various tasks.

Recently, although pre-trained language models have achieved great success on multilingual NLP (Natural Language Processing) tasks, the lack of training data on many tasks in low-resource languages still limits their performance. One effective way of solving that problem is to transfer knowledge from rich-resource languages to low-resource languages. However, many previous works on cross-lingual transfer rely heavily on the parallel corpus or translation models, which are often difficult to obtain. We propose a novel approach to conduct zero-shot cross-lingual transfer with a pre-trained model. It consists of a Bilingual Task Fitting module that applies task-related bilingual information alignment; a self-training module generates pseudo soft and hard labels for unlabeled data and utilizes them to conduct self-training. We got the new SOTA on different tasks without any dependencies on the parallel corpus or translation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes