CLAIMar 6, 2022

Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation

Peking U
arXiv:2203.02889v2639 citationsh-index: 20Has Code
AI Analysis

This work addresses a specific bottleneck in machine translation models, offering an incremental improvement for practitioners using these techniques.

The paper tackles the conflict between label smoothing and vocabulary sharing in neural machine translation by proposing Masked Label Smoothing (MLS), which masks source-side word probabilities to reduce bias, resulting in consistent improvements in translation quality and model calibration across bilingual and multilingual datasets.

Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models. However, we argue that simply applying both techniques can be conflicting and even leads to sub-optimal performance. When allocating smoothed probability, original label smoothing treats the source-side words that would never appear in the target language equally to the real target-side words, which could bias the translation model. To address this issue, we propose Masked Label Smoothing (MLS), a new mechanism that masks the soft label probability of source-side words to zero. Simple yet effective, MLS manages to better integrate label smoothing with vocabulary sharing. Our extensive experiments show that MLS consistently yields improvement over original label smoothing on different datasets, including bilingual and multilingual translation from both translation quality and model's calibration. Our code is released at https://github.com/PKUnlp-icler/MLS

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes