CLJun 3

SemBlock: Semantic Boundary Dynamic Blocks for Diffusion LLMs

Xinrui Song, Zhuoran Wang, Mingju Gao, Hao Tang

arXiv:2606.0496489.5Has Code

Predicted impact top 33% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners of diffusion language models, this work addresses the mismatch between fixed block boundaries and semantic structure, offering a practical improvement in blockwise decoding.

SemBlock introduces a dynamic block decoding framework for diffusion LLMs that aligns block boundaries with semantic units, improving generation quality across math, code, and instruction-following tasks over fixed-block and AdaBlock baselines.

Diffusion language models (DLMs) generate text through iterative denoising, and blockwise decoding improves their practicality by committing tokens in local blocks. However, existing blockwise methods typically rely on fixed block sizes or delimiter-based runtime signals, which do not necessarily align with semantic boundaries. In this paper, we propose SemBlock, a semantic-boundary-driven dynamic block decoding framework for diffusion LLMs. SemBlock formulates dynamic block construction as semantic boundary prediction and trains lightweight predictors on frozen LLaDA hidden states. To provide supervision, we construct SemBound, a semantic-boundary dataset that derives boundary labels from discourse units, reasoning steps, and implementation spans across natural language, math, and code tasks. During inference, SemBlock uses predicted boundary probabilities to select the ending position of each dynamic block. Experiments on GSM8K, IFEval, MATH, and HumanEval show that SemBlock consistently improves over fixed-block decoding and AdaBlock. Our code is publicly available: https://github.com/TH-AI-Lab-PKU/SemBlock.

View on arXiv PDF Code

Similar