LGCLMLJun 29, 2020

An EM Approach to Non-autoregressive Conditional Sequence Generation

arXiv:2006.16378v143 citations
Originality Incremental advance
AI Analysis

This work addresses the latency issue in sequence generation for applications like machine translation, but it is incremental as it builds on existing non-autoregressive methods.

The paper tackles the problem of high inference latency in autoregressive sequence generation by proposing a unified EM framework that jointly optimizes autoregressive and non-autoregressive models to reduce multi-modality, achieving competitive performance with existing non-autoregressive models and significantly reducing latency on machine translation benchmarks.

Autoregressive (AR) models have been the dominating approach to conditional sequence generation, but are suffering from the issue of high inference latency. Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel but could only achieve inferior accuracy compared to their autoregressive counterparts, primarily due to a difficulty in dealing with the multi-modality in sequence generation. This paper proposes a new approach that jointly optimizes both AR and NAR models in a unified Expectation-Maximization (EM) framework. In the E-step, an AR model learns to approximate the regularized posterior of the NAR model. In the M-step, the NAR model is updated on the new posterior and selects the training examples for the next AR model. This iterative process can effectively guide the system to remove the multi-modality in the output sequences. To our knowledge, this is the first EM approach to NAR sequence generation. We evaluate our method on the task of machine translation. Experimental results on benchmark data sets show that the proposed approach achieves competitive, if not better, performance with existing NAR models and significantly reduces the inference latency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes