CL LGMar 24

LogicDiff: Logic-Guided Denoising Improves Reasoning in Masked Diffusion Language Models

arXiv:2603.267714.4

Predicted impact top 77% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and practitioners using masked diffusion language models for reasoning tasks, LogicDiff provides a lightweight inference-time method to significantly improve reasoning performance without modifying the base model.

LogicDiff replaces confidence-based unmasking in masked diffusion language models with logic-role-guided unmasking, improving LLaDA-8B-Instruct accuracy from 22.0% to 60.7% on GSM8K and from 23.6% to 29.2% on MATH-500 with less than 6% speed overhead.

Masked diffusion language models (MDLMs) generate text by iteratively unmasking tokens from a fully masked sequence, offering parallel generation and bidirectional context. However, their standard confidence-based unmasking strategy systematically defers high-entropy logical connective tokens, the critical branching points in reasoning chains, leading to severely degraded reasoning performance. We introduce LogicDiff, an inference-time method that replaces confidence-based unmasking with logic-role-guided unmasking. A lightweight classification head (4.2M parameters, 0.05% of the base model) predicts the logical role of each masked position (premise, connective, derived step, conclusion, or filler) from the base model's hidden states with 98.4% accuracy. A dependency-ordered scheduler then unmasks tokens in logical dependency order: premises first, then connectives, then derived steps, then conclusions. Without modifying a single parameter of the base model and without any reinforcement learning or task-specific training, LogicDiff improves LLaDA-8B-Instruct accuracy from 22.0% to 60.7% on GSM8K (+38.7 percentage points) and from 23.6% to 29.2% on MATH-500 (+5.6 pp), with less than 6% speed overhead. Our results demonstrate that a substantial portion of the reasoning deficit in MDLMs is attributable to suboptimal token unmasking order, not to limitations of the model's learned representations.

View on arXiv PDF

Similar