CLApr 19, 2019

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

arXiv:1904.09077v21264 citations
AI Analysis

This work addresses cross-lingual NLP challenges by evaluating a pretrained model's generalization, but it is incremental as it builds on existing mBERT capabilities.

The paper explored mBERT's effectiveness for zero-shot cross-lingual transfer across 5 NLP tasks in 39 languages, finding it competitive with best-published methods and analyzing factors influencing transfer.

Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes