CLJul 29, 2021

Using Perturbed Length-aware Positional Encoding for Non-autoregressive Neural Machine Translation

Yui Oka, Katsuhito Sudoh, Satoshi Nakamura

arXiv:2107.13689v10.74 citations

Originality Incremental advance

AI Analysis

This work addresses a specific issue in machine translation for researchers and practitioners, offering an incremental improvement over existing NAT methods.

The paper tackled the problem of non-autoregressive neural machine translation (NAT) models outputting shorter sentences than autoregressive models by proposing sequence-level knowledge distillation with perturbed length-aware positional encoding, applied to a Levenshtein Transformer, which improved BLEU scores by up to 2.5 points on WMT14 German to English translation and produced longer sentences.

Non-autoregressive neural machine translation (NAT) usually employs sequence-level knowledge distillation using autoregressive neural machine translation (AT) as its teacher model. However, a NAT model often outputs shorter sentences than an AT model. In this work, we propose sequence-level knowledge distillation (SKD) using perturbed length-aware positional encoding and apply it to a student model, the Levenshtein Transformer. Our method outperformed a standard Levenshtein Transformer by 2.5 points in bilingual evaluation understudy (BLEU) at maximum in a WMT14 German to English translation. The NAT model output longer sentences than the baseline NAT models.

View on arXiv PDF

Similar