CLNov 14, 2023

UT5: Pretraining Non autoregressive T5 with unrolled denoising

arXiv:2311.08552v1h-index: 7
Originality Incremental advance
AI Analysis

This addresses the sequential decoding inefficiency for users of large language models, though it is incremental as it builds on existing non-autoregressive and T5 methods.

The paper tackled the performance bottleneck of autoregressive decoding in large language models by pretraining a non-autoregressive T5 model with unrolled denoising, achieving state-of-the-art results in downstream tasks like SQuAD question generation and XSum.

Recent advances in Transformer-based Large Language Models have made great strides in natural language generation. However, to decode K tokens, an autoregressive model needs K sequential forward passes, which may be a performance bottleneck for large language models. Many non-autoregressive (NAR) research are aiming to address this sequentiality bottleneck, albeit many have focused on a dedicated architecture in supervised benchmarks. In this work, we studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising and shown its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes