CLMay 8, 2021

Improving Cross-Lingual Reading Comprehension with Self-Training

Wei-Cheng Huang, Chien-yu Huang, Hung-yi Lee

arXiv:2105.03627v10.51 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of applying reading comprehension models across languages, which is incremental as it builds on existing multilingual pre-training methods.

The paper tackled the problem of cross-lingual reading comprehension by using self-training with unlabeled target language data, resulting in improvements for all languages tested.

Substantial improvements have been made in machine reading comprehension, where the machine answers questions based on a given context. Current state-of-the-art models even surpass human performance on several benchmarks. However, their abilities in the cross-lingual scenario are still to be explored. Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension. In this paper, we further utilized unlabeled data to improve the performance. The model is first supervised-trained on source language corpus, and then self-trained with unlabeled target language data. The experiment results showed improvements for all languages, and we also analyzed how self-training benefits cross-lingual reading comprehension in qualitative aspects.

View on arXiv PDF

Similar