LG AI CLSep 28, 2025

Training Optimal Large Diffusion Language Models

Jinjie Ni, Qian Liu, Chao Du, Longxu Dou, Hang Yan, Zili Wang, Tianyu Pang, Michael Qizhe Shieh

arXiv:2510.03280v225.020 citationsh-index: 15

Originality Highly original

AI Analysis

This work addresses the problem of efficiently training diffusion language models for the AI community, offering both short-term practical guidance and long-term inspiration.

The authors introduced Quokka, the first systematic scaling law for diffusion language models that covers compute-constrained and data-constrained regimes, providing practical guidance for training these models.

We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.

View on arXiv PDF

Similar