LGAICLMay 8

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

arXiv:2605.0792492.7
AI Analysis

For researchers in text generation, TS-DFM provides a method to drastically accelerate discrete flow matching without sacrificing quality, outperforming larger models and more data.

Discrete flow matching for text generation requires many steps; distillation with multi-step trajectories fails due to blind stochastic jumps. TS-DFM replaces blind jumps with guided navigation using an energy compass during training, achieving 32% lower perplexity than the 1024-step teacher at 8 steps (128x faster) and best perplexity among discrete-generation baselines.

Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the student underperforms, the usual explanation is insufficient capacity. We argue the opposite: the trajectory is the bottleneck, not the student. Each training trajectory is built through a chain of blind stochastic jumps with no evaluation of sequence quality; a single bad decision at an early midpoint propagates through subsequent steps, yet the student must imitate the result. Trajectory-Shaped Discrete Flow Matching (TS-DFM) replaces these blind jumps with guided navigation: a lightweight energy compass evaluates candidate continuations at each midpoint, selecting the most coherent. All shaping is training-only; inference cost is unchanged. On 170M-parameter language modeling, the shaped student at 8 steps achieves 32% lower perplexity than the 1,024-step teacher while being 128x faster, with gains consistent across source distributions and three evaluators of increasing scale. TS-DFM achieves the best perplexity of any discrete-generation baseline we compare against, including methods trained on 6x more data or using 5x larger models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes