CLJul 29, 2021

Using Perturbed Length-aware Positional Encoding for Non-autoregressive Neural Machine Translation

arXiv:2107.13689v14 citations
Originality Incremental advance
AI Analysis

This work addresses a specific issue in machine translation for researchers and practitioners, offering an incremental improvement over existing NAT methods.

The paper tackled the problem of non-autoregressive neural machine translation (NAT) models outputting shorter sentences than autoregressive models by proposing sequence-level knowledge distillation with perturbed length-aware positional encoding, applied to a Levenshtein Transformer, which improved BLEU scores by up to 2.5 points on WMT14 German to English translation and produced longer sentences.

Non-autoregressive neural machine translation (NAT) usually employs sequence-level knowledge distillation using autoregressive neural machine translation (AT) as its teacher model. However, a NAT model often outputs shorter sentences than an AT model. In this work, we propose sequence-level knowledge distillation (SKD) using perturbed length-aware positional encoding and apply it to a student model, the Levenshtein Transformer. Our method outperformed a standard Levenshtein Transformer by 2.5 points in bilingual evaluation understudy (BLEU) at maximum in a WMT14 German to English translation. The NAT model output longer sentences than the baseline NAT models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes