LGMLNov 20, 2019

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

arXiv:1911.08717v292 citations
Originality Incremental advance
AI Analysis

This work addresses the trade-off between speed and accuracy in machine translation for applications requiring fast inference, though it is incremental as it builds on existing fine-tuning and curriculum learning ideas.

The paper tackled the problem of inferior translation accuracy in non-autoregressive neural machine translation (NAT) models compared to autoregressive (AT) models by fine-tuning AT models into NAT models using curriculum learning, resulting in over 1 BLEU score improvement and more than 10 times faster inference speed.

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than $1$ BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than $10$ times) the inference process over AT baselines.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes