CLOct 30, 2020

VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation

arXiv:2010.16046v2738 citations
Originality Highly original
AI Analysis

This work addresses cross-lingual transferability for language understanding and generation tasks, offering a flexible method that benefits a wide range of applications, though it is incremental in improving existing multilingual pre-training approaches.

The paper tackled the problem of loose and implicit cross-lingual alignment in multilingual pre-training by introducing a cross-attention module to explicitly build interdependence between languages, resulting in new state-of-the-art results on cross-lingual understanding tasks and gains of up to 1-2 BLEU on translation tasks.

Existing work in multilingual pretraining has demonstrated the potential of cross-lingual transferability by training a unified Transformer encoder for multiple languages. However, much of this work only relies on the shared vocabulary and bilingual contexts to encourage the correlation across languages, which is loose and implicit for aligning the contextual representations between languages. In this paper, we plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. More importantly, when fine-tuning on downstream tasks, the cross-attention module can be plugged in or out on-demand, thus naturally benefiting a wider range of cross-lingual tasks, from language understanding to generation. As a result, the proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark, covering text classification, sequence labeling, question answering, and sentence retrieval. For cross-lingual generation tasks, it also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1~2 BLEU.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes