CLLGDec 29, 2023

Principled Gradient-based Markov Chain Monte Carlo for Text Generation

MIT
arXiv:2312.17710v12 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses a critical limitation in energy-based text generation for researchers and practitioners, though it is incremental as it builds on existing gradient-based MCMC methods.

The paper tackled the problem of incorrect sampling from target language model distributions in gradient-based MCMC for text generation, proposing faithful samplers that generate more fluent text and better adhere to control objectives.

Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the problem of designing text samplers that are faithful, meaning that they have the target text distribution as its limiting distribution. We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly, and study their theoretical properties. Through experiments on various forms of text generation, we demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes