CLApr 24, 2023

Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation

Tsinghua
arXiv:2304.11791v1137 citationsh-index: 74
Originality Incremental advance
AI Analysis

This work addresses the performance gap for non-autoregressive models in text generation, offering faster decoding with competitive quality, though it is incremental as it builds on existing pre-training and NAR approaches.

The paper tackles the problem of non-autoregressive text generation models lacking proper pre-training, which limits their performance across tasks. It proposes PreDAT, achieving a 4.2 average score improvement over existing models and 17 times faster throughput while matching or surpassing autoregressive baselines in n-gram metrics.

Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 scores on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes