CLJul 15, 2020

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

arXiv:2007.07834v2841 citations
AI Analysis

This addresses cross-lingual representation learning for multilingual NLP applications, with incremental improvements over existing methods.

The authors tackled cross-lingual language model pre-training by proposing an information-theoretic framework and a contrastive learning task to improve transferability, achieving considerably better performance on benchmarks.

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https://aka.ms/infoxlm.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes