CLLGMLApr 3, 2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

arXiv:2004.01655v1122 citations
AI Analysis

This addresses the problem of slow decoding in machine translation for applications requiring real-time processing, though it is an incremental improvement on existing non-autoregressive methods.

The paper tackles the challenge of modeling word order in non-autoregressive machine translation, where cross entropy loss penalizes small word order shifts, by proposing aligned cross entropy (AXE) as an alternative loss function. The result is that AXE-based training of conditional masked language models substantially improves performance on major WMT benchmarks, setting a new state of the art for non-autoregressive models.

Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-autoregressive models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes