LGCLMay 13

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

arXiv:2605.1393576.6
Predicted impact top 18% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners of diffusion language models, TraFL provides a post-training method that avoids the trajectory locking failure mode of reward-maximizing approaches, achieving robust gains across multiple benchmarks.

TraFL introduces a trajectory-balance objective for diffusion language models that prevents over-concentration on narrow denoising paths, enabling consistent improvements over base models across mathematical reasoning and code generation benchmarks, with gains persisting under increased sampling budgets.

Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking: sampled reward-driven updates over-concentrate probability mass onto a narrow set of denoising paths, reducing coverage of alternative correct solutions under repeated sampling. To address this, we propose TraFL (Trajectory Flow baLancing), a trajectory-balance objective that trains the policy toward a reward-tilted target distribution anchored to a frozen reference model. We make this practical for diffusion language models with a diffusion-compatible sequence-level surrogate and a learned prompt-dependent normalization. Across mathematical reasoning and code generation benchmarks, TraFL is the only evaluated post-training method that improves over the base model in every benchmark-length setting, with gains that persist as the sampling budget increases. The improvements transfer to held-out evaluations: TraFL stays above the base model on Minerva Math and is the strongest method on every LiveCodeBench difficulty split.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes