CL AIOct 30, 2025

Don't Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation

arXiv:2510.26200v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses a key failure mode for users needing stable and controllable text generation with diffusion models, though it is an incremental improvement focused on inference-time ordering.

The paper tackles the problem of update forgetting in diffusion language models, where uniform updates erase earlier edits, and proposes Token Timestep Allocation (TTA) to improve controllability and fluency. Results show over 20% higher accuracy and nearly halved perplexity in sentiment control, and reduced toxicity and perplexity in detoxification.

While diffusion language models (DLMs) enable fine-grained refinement, their practical controllability remains fragile. We identify and formally characterize a central failure mode called update forgetting, in which uniform and context agnostic updates induce token level fluctuations across timesteps, erasing earlier semantic edits and disrupting the cumulative refinement process, thereby degrading fluency and coherence. As this failure originates in uniform and context agnostic updates, effective control demands explicit token ordering. We propose Token Timestep Allocation (TTA), which realizes soft and semantic token ordering via per token timestep schedules: critical tokens are frozen early, while uncertain tokens receive continued refinement. This timestep based ordering can be instantiated as either a fixed policy or an adaptive policy driven by task signals, thereby supporting a broad spectrum of refinement strategies. Because it operates purely at inference time, it applies uniformly across various DLMs and naturally extends to diverse supervision sources. Empirically, TTA improves controllability and fluency: on sentiment control, it yields more than 20 percent higher accuracy and nearly halves perplexity using less than one fifth the steps; in detoxification, it lowers maximum toxicity (12.2 versus 14.5) and perplexity (26.0 versus 32.0). Together, these results demonstrate that softened ordering via timestep allocation is the critical lever for mitigating update forgetting and achieving stable and controllable diffusion text generation.

View on arXiv PDF

Similar