Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion
This work introduces a new paradigm for discrete diffusion models that leverages pretrained LMs as energy functions, improving text generation quality and enabling competitive performance on reasoning tasks.
The authors propose a discrete diffusion language model using Glauber dynamics with an energy function derived from pretrained language models, achieving competitive performance with autoregressive models on text generation and zero-shot reasoning tasks.
We present a discrete diffusion-based language model using Glauber dynamics from statistical physics. Our main insight is that instead of trying to train a discrete state space diffusion model using Glauber dynamics with a uniform transition kernel as the forward process, one can set up an ``energy function'' based on pretrained causal/masked language models. When viewed as the stationary distribution, this energy function allows us to significantly improve the quality of the generated text. Incorporating UL2 as the pretrained model into our diffusion pipeline, we outperform prior diffusion based LMs and perform competitively with autoregressive models of comparable model sizes. Furthermore, our models are competitive with or outperform prior diffusion models and GPT-2 style auto-regressive models on zero-shot common sense reasoning tasks as well as planning and search tasks like Sudoku and Zebra puzzles.