CLMar 5

Diffusion LLMs can think EoS-by-EoS

arXiv:2603.05197v1
Originality Incremental advance
AI Analysis

This research provides insight into the internal workings of diffusion LLMs, specifically how they might leverage EoS tokens for complex reasoning, which is significant for researchers and developers aiming to improve the interpretability and performance of these models.

This paper investigates why diffusion LLMs perform better on complex reasoning tasks when allowed to generate more tokens than necessary, padding with end-of-sequence (EoS) tokens. The authors hypothesize that EoS tokens serve as a hidden scratchpad for computation, confirming this through controlled prompting experiments that show improved reasoning with added EoS tokens and causal interventions that alter model output by patching EoS token hidden states.

Diffusion LLMs have been proposed as an alternative to autoregressive LLMs, excelling especially at complex reasoning tasks with interdependent sub-goals. Curiously, this is particularly true if the generation length, i.e., the number of tokens the model has to output, is set to a much higher value than is required for providing the correct answer to the task, and the model pads its answer with end-of-sequence (EoS) tokens. We hypothesize that diffusion models think EoS-by-EoS, that is, they use the representations of EoS tokens as a hidden scratchpad, which allows them to solve harder reasoning problems. We experiment with the diffusion models LLaDA1.5, LLaDA2.0-mini, and Dream-v0 on the tasks Addition, Entity Tracking, and Sudoku. In a controlled prompting experiment, we confirm that adding EoS tokens improves the LLMs' reasoning capabilities. To further verify whether they serve as space for hidden computations, we patch the hidden states of the EoS tokens with those of a counterfactual generation, which frequently changes the generated output to the counterfactual. The success of the causal intervention underscores that the EoS tokens, which one may expect to be devoid of meaning, carry information on the problem to solve. The behavioral experiments and the causal interventions indicate that diffusion LLMs can indeed think EoS-by-EoS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes