Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs
This work addresses the problem of scaling neural machine translation to massive datasets for researchers and practitioners, though it is incremental as it builds on existing pretraining methods.
The paper tackled training neural machine translation systems with over 40 billion bilingual sentence pairs, addressing challenges like data noise and long training time, and achieved a BLEU score of 32.3 on WMT17 Chinese-English, a +3.2 improvement over state-of-the-art.
In this paper, we investigate the problem of training neural machine translation (NMT) systems with a dataset of more than 40 billion bilingual sentence pairs, which is larger than the largest dataset to date by orders of magnitude. Unprecedented challenges emerge in this situation compared to previous NMT work, including severe noise in the data and prohibitively long training time. We propose practical solutions to handle these issues and demonstrate that large-scale pretraining significantly improves NMT performance. We are able to push the BLEU score of WMT17 Chinese-English dataset to 32.3, with a significant performance boost of +3.2 over existing state-of-the-art results.