CLAIJun 2, 2021

Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

arXiv:2106.00903v2726 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses translation quality issues for low-frequency words in non-autoregressive models, offering an incremental improvement over existing methods.

The paper tackles the problem of low-frequency word errors in non-autoregressive translation by proposing reverse knowledge distillation and a combined training strategy, achieving BLEU scores of 28.2 on WMT14 English-German and 33.9 on WMT16 Romanian-English.

Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models. However, there exists a discrepancy on low-frequency words between the distilled and the original data, leading to more errors on predicting low-frequency words. To alleviate the problem, we directly expose the raw data into NAT by leveraging pretraining. By analyzing directed alignments, we found that KD makes low-frequency source words aligned with targets more deterministically but fails to align sufficient low-frequency words from target to source. Accordingly, we propose reverse KD to rejuvenate more alignments for low-frequency target words. To make the most of authentic and synthetic data, we combine these complementary approaches as a new training strategy for further boosting NAT performance. We conduct experiments on five translation benchmarks over two advanced architectures. Results demonstrate that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words. Encouragingly, our approach achieves 28.2 and 33.9 BLEU points on the WMT14 English-German and WMT16 Romanian-English datasets, respectively. Our code, data, and trained models are available at \url{https://github.com/alphadl/RLFW-NAT}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes