CLDec 24, 2020

Why Neural Machine Translation Prefers Empty Outputs

arXiv:2012.13454v19 citations
AI Analysis

This work addresses a specific problem of erroneous empty outputs in neural machine translation systems, which is important for practitioners and researchers working on NMT quality.

This paper investigates why neural machine translation (NMT) systems frequently produce empty outputs, identifying two main causes: label smoothing reducing confidence in correct-length translations, and the use of a single, high-frequency End-of-Sentence (EoS) token for all target sentence lengths, which implicitly smooths towards zero-length translations.

We investigate why neural machine translation (NMT) systems assign high probability to empty translations. We find two explanations. First, label smoothing makes correct-length translations less confident, making it easier for the empty translation to finally outscore them. Second, NMT systems use the same, high-frequency EoS word to end all target sentences, regardless of length. This creates an implicit smoothing that increases zero-length translations. Using different EoS types in target sentences of different lengths exposes and eliminates this implicit smoothing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes