CLLGJun 25, 2024

Discrete Diffusion Language Model for Efficient Text Summarization

arXiv:2407.10998v216 citations
Originality Highly original
AI Analysis

This work addresses the challenge of efficient text summarization for NLP applications, representing an incremental improvement over existing discrete diffusion models.

The paper tackles the problem of applying discrete diffusion models to conditional long-text generation tasks like abstractive summarization, where prior methods failed due to architectural incompatibilities. It introduces a semantic-aware noising process and CrossMamba adaptation, achieving state-of-the-art performance on three benchmark datasets (Gigaword, CNN/DailyMail, Arxiv) with faster inference speeds than autoregressive models.

While diffusion models excel at conditional generating high-quality images, prior works in discrete diffusion models were not evaluated on conditional long-text generation. In this work, we address the limitations of prior discrete diffusion models for conditional long-text generation, particularly in long sequence-to-sequence tasks such as abstractive summarization. Despite fast decoding speeds compared to autoregressive methods, previous diffusion models failed on the abstractive summarization task due to the incompatibility between the backbone architectures and the random noising process. To overcome these challenges, we introduce a novel semantic-aware noising process that enables Transformer backbones to handle long sequences effectively. Additionally, we propose CrossMamba, an adaptation of the Mamba model to the encoder-decoder paradigm, which integrates seamlessly with the random absorbing noising process. Our approaches achieve state-of-the-art performance on three benchmark summarization datasets: Gigaword, CNN/DailyMail, and Arxiv, outperforming existing discrete diffusion models on ROUGE metrics as well as possessing much faster speed in inference compared to autoregressive models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes