CLJun 3

SAID: Accelerating Diffusion-Based Language Models via Scaffold-Aware Iterative Decoding

arXiv:2606.0497498.0Has Code
Predicted impact top 3% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners using diffusion LLMs, SAID reduces inference cost without sacrificing quality, addressing a key bottleneck in non-autoregressive generation.

SAID accelerates diffusion-based language models by focusing denoising on scaffold tokens first, achieving up to 9.1x speedup on LLaDA models while maintaining competitive performance across math, coding, and knowledge benchmarks.

Diffusion large language models (DLLMs) enable non-autoregressive generation by iteratively denoising corrupted token sequences with bidirectional context. Despite their ability to update multiple positions in parallel, inference remains costly due to the many denoising steps required for high-quality generation. We propose SAID, a Scaffold-Aware Iterative Decoding framework that accelerates DLLMs by reallocating computation across tokens. SAID first spends denoising computation on scaffold tokens to establish the coarse semantic structure, and then completes predictable detail tokens with fewer steps. We further adapt SAID to block-wise diffusion decoding and introduce Confidence-Hierarchical Layered Generation (CHLG), which assigns additional steps only to low-confidence tokens. Experiments on LLaDA-8B and LLaDA 1.5 across math, coding, and knowledge benchmarks show that SAID significantly accelerates DLLM inference with a maximum speedup of 9.1x while maintaining competitive performance. Our code is publicly available: https://github.com/TH-AI-Lab-PKU/SAID.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes