CLAug 19, 2018

Neural Machine Translation of Text from Non-Native Speakers

Antonios Anastasopoulos, Alison Lui, Toan Nguyen, David Chiang

arXiv:1808.06267v232.11102 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the issue of translation quality for non-native speakers' text, but it is incremental as it builds on existing data augmentation and error correction methods.

The paper tackled the problem of neural machine translation systems degrading when processing text with grammatical errors from non-native speakers, and showed that augmenting training data with artificially-introduced errors can recover 1.5 BLEU out of 2.4 BLEU lost due to such errors.

Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.5 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.

View on arXiv PDF Code

Similar