CLLGMar 24

LogicDiff: Logic-Guided Denoising Improves Reasoning in Masked Diffusion Language Models

arXiv:2603.267714.4
Predicted impact top 77% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners using masked diffusion language models for reasoning tasks, LogicDiff provides a lightweight inference-time method to significantly improve reasoning performance without modifying the base model.

LogicDiff replaces confidence-based unmasking in masked diffusion language models with logic-role-guided unmasking, improving LLaDA-8B-Instruct accuracy from 22.0% to 60.7% on GSM8K and from 23.6% to 29.2% on MATH-500 with less than 6% speed overhead.

Masked diffusion language models (MDLMs) generate text by iteratively unmasking tokens from a fully masked sequence, offering parallel generation and bidirectional context. However, their standard confidence-based unmasking strategy systematically defers high-entropy logical connective tokens, the critical branching points in reasoning chains, leading to severely degraded reasoning performance. We introduce LogicDiff, an inference-time method that replaces confidence-based unmasking with logic-role-guided unmasking. A lightweight classification head (4.2M parameters, 0.05% of the base model) predicts the logical role of each masked position (premise, connective, derived step, conclusion, or filler) from the base model's hidden states with 98.4% accuracy. A dependency-ordered scheduler then unmasks tokens in logical dependency order: premises first, then connectives, then derived steps, then conclusions. Without modifying a single parameter of the base model and without any reinforcement learning or task-specific training, LogicDiff improves LLaDA-8B-Instruct accuracy from 22.0% to 60.7% on GSM8K (+38.7 percentage points) and from 23.6% to 29.2% on MATH-500 (+5.6 pp), with less than 6% speed overhead. Our results demonstrate that a substantial portion of the reasoning deficit in MDLMs is attributable to suboptimal token unmasking order, not to limitations of the model's learned representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes