CLOct 21, 2020

Token Drop mechanism for Neural Machine Translation

arXiv:2010.11018v1994 citations
Originality Incremental advance
AI Analysis

This addresses generalization and overfitting issues in neural machine translation, but appears incremental as it builds on existing dropout techniques.

The paper tackles the problem of neural machine translation models being vulnerable to unfamiliar inputs by proposing Token Drop, which replaces dropped tokens with a special token and adds self-supervised objectives. The method achieves significant improvements over a strong Transformer baseline on Chinese-English and English-Romanian benchmarks.

Neural machine translation with millions of parameters is vulnerable to unfamiliar inputs. We propose Token Drop to improve generalization and avoid overfitting for the NMT model. Similar to word dropout, whereas we replace dropped token with a special token instead of setting zero to words. We further introduce two self-supervised objectives: Replaced Token Detection and Dropped Token Prediction. Our method aims to force model generating target translation with less information, in this way the model can learn textual representation better. Experiments on Chinese-English and English-Romanian benchmark demonstrate the effectiveness of our approach and our model achieves significant improvements over a strong Transformer baseline.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes