CLSep 9, 2019

Combining SMT and NMT Back-Translated Data for Efficient NMT

arXiv:1909.03750v121 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient data augmentation in machine translation, though it appears incremental by building on existing back-translation methods.

The paper tackled the problem of improving Neural Machine Translation (NMT) performance by augmenting training data with back-translated synthetic sentences, finding that combining data from both NMT and Statistical Machine Translation (SMT) models yields the best results.

Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes