CLJul 6, 2019

Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation

arXiv:1907.03060v11098 citations
Originality Incremental advance
AI Analysis

This addresses translation quality in extremely low-resource language pairs, though it is incremental as it combines existing techniques like domain adaptation and back-translation.

The paper tackled low-resource neural machine translation for Japanese-Russian by exploiting out-of-domain data through multilingual transfer learning, resulting in an improvement of over 3.7 BLEU points over a strong baseline.

This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese--Russian pair for benchmarking. Although there are many solutions for low-resource scenarios, such as multilingual NMT and back-translation, we have empirically confirmed their limited success when restricted to in-domain data. We therefore propose to exploit out-of-domain data through transfer learning, by using it to first train a multilingual NMT model followed by multistage fine-tuning on in-domain parallel and back-translated pseudo-parallel data. Our approach, which combines domain adaptation, multilingualism, and back-translation, helps improve the translation quality by more than 3.7 BLEU points, over a strong baseline, for this extremely low-resource scenario.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes