LGAICLSep 28, 2025

Training Optimal Large Diffusion Language Models

arXiv:2510.03280v220 citationsh-index: 15
AI Analysis

This work addresses the problem of efficiently training diffusion language models for the AI community, offering both short-term practical guidance and long-term inspiration.

The authors introduced Quokka, the first systematic scaling law for diffusion language models that covers compute-constrained and data-constrained regimes, providing practical guidance for training these models.

We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes