LGAICLJan 30

TABES: Trajectory-Aware Backward-on-Entropy Steering for Masked Diffusion Models

arXiv:2602.00250v21 citationsh-index: 116
Originality Highly original
AI Analysis

This addresses the computational inefficiency of search-based methods for non-autoregressive generation, offering a more efficient solution for generative tasks.

The paper tackles the problem of trajectory lock-in in Masked Diffusion Models, where early hallucinations lead to global incoherence, by proposing Backward-on-Entropy Steering, which achieves a superior Pareto frontier for inference-time scaling compared to existing methods.

Masked Diffusion Models (MDMs) have emerged as a promising non-autoregressive paradigm for generative tasks, offering parallel decoding and bidirectional context utilization. However, current sampling methods rely on simple confidence-based heuristics that ignore the long-term impact of local decisions, leading to trajectory lock-in where early hallucinations cascade into global incoherence. While search-based methods mitigate this, they incur prohibitive computational costs ($O(K)$ forward passes per step). In this work, we propose Backward-on-Entropy (BoE) Steering, a gradient-guided inference framework that approximates infinite-horizon lookahead via a single backward pass. We formally derive the Token Influence Score (TIS) from a first-order expansion of the trajectory cost functional, proving that the gradient of future entropy with respect to input embeddings serves as an optimal control signal for minimizing uncertainty. To ensure scalability, we introduce \texttt{ActiveQueryAttention}, a sparse adjoint primitive that exploits the structure of the masking objective to reduce backward pass complexity. BoE achieves a superior Pareto frontier for inference-time scaling compared to existing unmasking methods, demonstrating that gradient-guided steering offers a mathematically principled and efficient path to robust non-autoregressive generation. We will release the code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes