LG MLApr 20

Discrete Tilt Matching

Yuyuan Chen, Shiyi Wang, Peter Potaptchik, Jaeyeon Kim, Michael S. Albergo

arXiv:2604.1873992.3h-index: 22

Predicted impact top 5% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a practical RL fine-tuning method for masked diffusion LLMs, addressing a known bottleneck in training these models.

The authors propose Discrete Tilt Matching (DTM), a likelihood-free method for fine-tuning masked diffusion LLMs using reinforcement learning, which avoids intractable marginal likelihoods. DTM achieves strong gains on Sudoku and Countdown tasks while remaining competitive on MATH500 and GSM8K.

Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.

View on arXiv PDF

Similar