CLMay 31, 2018

On the Impact of Various Types of Noise on Neural Machine Translation

arXiv:1805.12282v11141 citations
Originality Synthesis-oriented
AI Analysis

This addresses data quality issues for machine translation researchers, but it is incremental as it analyzes known problems without introducing new solutions.

The study investigated how different types of noise in parallel training data affect neural machine translation quality, finding that neural models are generally more harmed by noise than statistical models, with one type causing them to simply copy input sentences.

We examine how various types of noise in the parallel training data impact the quality of neural machine translation systems. We create five types of artificial noise and analyze how they degrade performance in neural and statistical machine translation. We find that neural models are generally more harmed by noise than statistical models. For one especially egregious type of noise they learn to just copy the input sentence.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes