CLSep 26, 2019

Selecting Artificially-Generated Sentences for Fine-Tuning Neural Machine Translation

arXiv:1909.12016v130.11004 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of data scarcity in machine translation for researchers and practitioners, though it appears incremental as it builds on existing data augmentation and selection techniques.

The paper tackled the problem of improving German-to-English neural machine translation by using artificially-generated sentences and data-selection algorithms, showing that these generated sentences can be more beneficial than authentic pairs and enhance performance when combined with selection methods.

Neural Machine Translation (NMT) models tend to achieve best performance when larger sets of parallel sentences are provided for training. For this reason, augmenting the training set with artificially-generated sentence pairs can boost performance. Nonetheless, the performance can also be improved with a small number of sentences if they are in the same domain as the test set. Accordingly, we want to explore the use of artificially-generated sentences along with data-selection algorithms to improve German-to-English NMT models trained solely with authentic data. In this work, we show how artificially-generated sentences can be more beneficial than authentic pairs, and demonstrate their advantages when used in combination with data-selection algorithms.

View on arXiv PDF

Similar