CLAIJan 4, 2016

Mutual Information and Diverse Decoding Improve Neural Machine Translation

arXiv:1601.00372v2126 citations
Originality Incremental advance
AI Analysis

This work addresses translation quality for users of neural machine translation systems, offering incremental improvements over existing methods.

The authors tackled the problem of neural machine translation by proposing an objective that maximizes mutual information between source and target sentences, along with a diverse decoding algorithm, resulting in consistent performance improvements on WMT German/English and French/English tasks.

Sequence-to-sequence neural translation models learn semantic and syntactic relations between sentence pairs by optimizing the likelihood of the target given the source, i.e., $p(y|x)$, an objective that ignores other potentially useful sources of information. We introduce an alternative objective function for neural MT that maximizes the mutual information between the source and target sentences, modeling the bi-directional dependency of sources and targets. We implement the model with a simple re-ranking method, and also introduce a decoding algorithm that increases diversity in the N-best list produced by the first pass. Applied to the WMT German/English and French/English tasks, the proposed models offers a consistent performance boost on both standard LSTM and attention-based neural MT architectures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes