CLSep 9, 2019

Combining SMT and NMT Back-Translated Data for Efficient NMT

Alberto Poncelas, Maja Popovic, Dimitar Shterionov, Gideon Maillette de Buy Wenniger, Andy Way

arXiv:1909.03750v11.721 citations

Originality Incremental advance

AI Analysis

This work addresses the need for efficient data augmentation in machine translation, though it appears incremental by building on existing back-translation methods.

The paper tackled the problem of improving Neural Machine Translation (NMT) performance by augmenting training data with back-translated synthetic sentences, finding that combining data from both NMT and Statistical Machine Translation (SMT) models yields the best results.

Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.

View on arXiv PDF

Similar