CL AI LGMay 27, 2019

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

Jasdeep Singh, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

arXiv:1905.11471v16.484 citations

Originality Incremental advance

AI Analysis

This addresses performance gaps in low-resource languages for tasks like natural language inference and question answering, representing an incremental advance in data augmentation techniques.

The paper tackles the problem of improving multilingual NLP performance by introducing XLDA, a cross-lingual data augmentation method that replaces text segments with translations, achieving up to 4.8% improvements on the XNLI benchmark and state-of-the-art results for Greek, Turkish, and Urdu, with a 1.0% increase on SQuAD.

While natural language processing systems often focus on a single language, multilingual transfer learning has the potential to improve performance, especially for low-resource languages. We introduce XLDA, cross-lingual data augmentation, a method that replaces a segment of the input text with its translation in another language. XLDA enhances performance of all 14 tested languages of the cross-lingual natural language inference (XNLI) benchmark. With improvements of up to $4.8\%$, training with XLDA achieves state-of-the-art performance for Greek, Turkish, and Urdu. XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language. On the SQuAD question answering task, we see that XLDA provides a $1.0\%$ performance increase on the English evaluation set. Comprehensive experiments suggest that most languages are effective as cross-lingual augmentors, that XLDA is robust to a wide range of translation quality, and that XLDA is even more effective for randomly initialized models than for pretrained models.

View on arXiv PDF

Similar