CLFeb 29, 2024

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov, Vladislav Lapikov, Roman Kim, Grigory Bartosh, Dmitry Molchanov, Sergey Markov, Dmitry Vetrov

arXiv:2402.19097v49.616 citationsh-index: 10Has CodeAAAI

Originality Incremental advance

AI Analysis

This work addresses text generation for NLP applications, but it is incremental as it builds on existing diffusion models with modifications to encodings and decoders.

The paper tackles the problem of improving diffusion models for text generation by operating in the space of pre-trained language model encodings, resulting in superior performance compared to existing non-autoregressive diffusion models on tasks like QQP, XSum, and Wiki-Auto.

This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models. Our code is available at https://github.com/M0RJIQUE/tencdm.

View on arXiv PDF Code

Similar