CLLGJun 19, 2024

Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching

arXiv:2406.13361v18 citations
Originality Incremental advance
AI Analysis

This addresses the issue of dirty data augmentation in multilingual NLP for researchers and practitioners, but it is incremental as it builds on existing code-switching techniques.

The paper tackles the problem of uncontrolled code-switching in cross-lingual transfer by proposing a Progressive Code-Switching method to generate moderately difficult examples, achieving state-of-the-art results on three zero-shot cross-lingual transfer tasks across ten languages.

Code-switching is a data augmentation scheme mixing words from multiple languages into source lingual text. It has achieved considerable generalization performance of cross-lingual transfer tasks by aligning cross-lingual contextual word representations. However, uncontrolled and over-replaced code-switching would augment dirty samples to model training. In other words, the excessive code-switching text samples will negatively hurt the models' cross-lingual transferability. To this end, we propose a Progressive Code-Switching (PCS) method to gradually generate moderately difficult code-switching examples for the model to discriminate from easy to hard. The idea is to incorporate progressively the preceding learned multilingual knowledge using easier code-switching data to guide model optimization on succeeding harder code-switching data. Specifically, we first design a difficulty measurer to measure the impact of replacing each word in a sentence based on the word relevance score. Then a code-switcher generates the code-switching data of increasing difficulty via a controllable temperature variable. In addition, a training scheduler decides when to sample harder code-switching data for model training. Experiments show our model achieves state-of-the-art results on three different zero-shot cross-lingual transfer tasks across ten languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes