CLDec 31, 2020

BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining

arXiv:2012.15525v354 citations
AI Analysis

This work addresses the performance gap between AR and NAR generation for researchers and practitioners in natural language generation, offering a unified model that supports various generation modes.

This paper introduces BANG, a pretraining model that unifies autoregressive (AR) and non-autoregressive (NAR) generation by controlling attention to previous tokens. BANG significantly improves NAR and semi-NAR performance, achieving absolute gains of 14.01 and 5.24 in overall scores on SQuAD 1.1 and XSum respectively, compared to semi-NAR baselines.

In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation by designing a novel model structure for large-scale pretraining. The pretrained BANG model can simultaneously support AR, NAR and semi-NAR generation to meet different requirements. Experiments on question generation (SQuAD 1.1), summarization (XSum) and dialogue generation (PersonaChat) show that BANG improves NAR and semi-NAR performance significantly as well as attaining comparable performance with strong AR pretrained models. Compared with the semi-NAR strong baselines, BANG achieves absolute improvements of 14.01 and 5.24 in the overall scores of SQuAD 1.1 and XSum, respectively. In addition, BANG achieves absolute improvements of 10.73, 6.39 and 5.90 in the overall scores of SQuAD, XSUM and PersonaChat respectively compared with the strong NAR baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes