CLAILGMLApr 19, 2019

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

arXiv:1904.09324v21242 citations
Originality Highly original
AI Analysis

This addresses the speed bottleneck in machine translation for real-time applications, representing a strong incremental advance over existing non-autoregressive methods.

The paper tackles the problem of slow autoregressive decoding in machine translation by introducing a non-autoregressive method using masked language modeling, achieving over 4 BLEU improvement on average for parallel decoding models and coming within about 1 BLEU of a standard transformer while decoding faster.

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes