CLJul 22, 2021

Confidence-Aware Scheduled Sampling for Neural Machine Translation

Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou

arXiv:2107.10427v131.6714 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in neural machine translation training, offering an incremental improvement over existing scheduled sampling methods.

The paper tackles the exposure bias problem in neural machine translation by proposing confidence-aware scheduled sampling, which uses model prediction confidence to fine-tune token replacement during training, resulting in significant improvements in translation quality and convergence speed on large-scale WMT datasets.

Scheduled sampling is an effective method to alleviate the exposure bias problem of neural machine translation. It simulates the inference scene by randomly replacing ground-truth target input tokens with predicted ones during training. Despite its success, its critical schedule strategies are merely based on training steps, ignoring the real-time model competence, which limits its potential performance and convergence speed. To address this issue, we propose confidence-aware scheduled sampling. Specifically, we quantify real-time model competence by the confidence of model predictions, based on which we design fine-grained schedule strategies. In this way, the model is exactly exposed to predicted tokens for high-confidence positions and still ground-truth tokens for low-confidence positions. Moreover, we observe vanilla scheduled sampling suffers from degenerating into the original teacher forcing mode since most predicted tokens are the same as ground-truth tokens. Therefore, under the above confidence-aware strategy, we further expose more noisy tokens (e.g., wordy and incorrect word order) instead of predicted ones for high-confidence token positions. We evaluate our approach on the Transformer and conduct experiments on large-scale WMT 2014 English-German, WMT 2014 English-French, and WMT 2019 Chinese-English. Results show that our approach significantly outperforms the Transformer and vanilla scheduled sampling on both translation quality and convergence speed.

View on arXiv PDF Code

Similar