Training Optimal Large Diffusion Language Models
This work addresses the problem of efficiently training diffusion language models for the AI community, offering both short-term practical guidance and long-term inspiration.
The authors introduced Quokka, the first systematic scaling law for diffusion language models that covers compute-constrained and data-constrained regimes, providing practical guidance for training these models.
We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.