LGCLMLOct 25, 2019

Fast Structured Decoding for Sequence Models

arXiv:1910.11555v2133 citations
Originality Incremental advance
AI Analysis

This work addresses inference speed and accuracy issues in machine translation for users of sequence models, representing an incremental improvement over existing non-autoregressive methods.

The paper tackles the problem of slow inference in autoregressive sequence models and inconsistent outputs in non-autoregressive models by incorporating a structured inference module with an efficient CRF approximation and dynamic transition technique, achieving a BLEU score of 26.80 on WMT14 En-De with only 8-14ms added latency.

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes