CLLGFeb 28, 2020

Do all Roads Lead to Rome? Understanding the Role of Initialization in Iterative Back-Translation

arXiv:2002.12867v16 citations
AI Analysis

This work addresses the role of initialization in iterative back-translation for unsupervised machine translation, showing it is incremental with limited impact on final outcomes.

The paper investigates the impact of initialization on iterative back-translation in unsupervised neural machine translation, finding that while initial quality affects final performance, the effect is small as the method strongly converges to similar solutions, with improvements limited to a narrow margin.

Back-translation provides a simple yet effective approach to exploit monolingual corpora in Neural Machine Translation (NMT). Its iterative variant, where two opposite NMT models are jointly trained by alternately using a synthetic parallel corpus generated by the reverse model, plays a central role in unsupervised machine translation. In order to start producing sound translations and provide a meaningful training signal to each other, existing approaches rely on either a separate machine translation system to warm up the iterative procedure, or some form of pre-training to initialize the weights of the model. In this paper, we analyze the role that such initialization plays in iterative back-translation. Is the behavior of the final system heavily dependent on it? Or does iterative back-translation converge to a similar solution given any reasonable initialization? Through a series of empirical experiments over a diverse set of warmup systems, we show that, although the quality of the initial system does affect final performance, its effect is relatively small, as iterative back-translation has a strong tendency to convergence to a similar solution. As such, the margin of improvement left for the initialization method is narrow, suggesting that future research should focus more on improving the iterative mechanism itself.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes