CLLGAug 4, 2025

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

arXiv:2508.02193v1135 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the latency issue in language model inference for applications requiring fast generation, though it appears incremental as it builds on existing discrete diffusion methods.

The paper tackles the problem of slow inference in large language models by introducing Seed Diffusion Preview, a discrete-state diffusion model that achieves 2,146 tokens/second inference speed on H20 GPUs while maintaining competitive performance on code benchmarks.

We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed. Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding, as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes