Dream 7B: Diffusion Large Language Models
This work addresses the efficiency and flexibility limitations of autoregressive language models for researchers and practitioners in natural language processing.
The authors tackled the problem of sequential token generation in autoregressive language models by developing Dream 7B, a diffusion-based large language model that refines sequences in parallel through iterative denoising. The model consistently outperformed existing diffusion language models on general, mathematical, and coding tasks, demonstrating superior planning abilities and inference flexibility.
We introduce Dream 7B, the most powerful open diffusion large language model to date. Unlike autoregressive (AR) models that generate tokens sequentially, Dream 7B employs discrete diffusion modeling to refine sequences in parallel through iterative denoising. Our model consistently outperforms existing diffusion language models on general, mathematical, and coding tasks. Dream 7B demonstrates superior planning abilities and inference flexibility, including arbitrary-order generation, infilling capabilities, and tunable quality-speed trade-offs. These results are achieved through simple yet effective training techniques, including AR-based LLM initialization and context-adaptive token-level noise rescheduling. We release both Dream-Base and Dream-Instruct to facilitate further research in diffusion-based language modeling.