Grokking of Diffusion Models: Case Study on Modular Addition
This work offers the first mechanistic decomposition of algorithmic learning in diffusion models, revealing how they bridge continuous generation and discrete reasoning.
The paper demonstrates that diffusion models exhibit grokking (delayed generalization after overfitting) on modular addition, and provides a mechanistic analysis of how they implement this task through periodic representations and a two-phase sampling process.
Despite their empirical success, how diffusion models generalize remains poorly understood from a mechanistic perspective. We demonstrate that diffusion models trained with flow-matching objectives exhibit grokking--delayed generalization after overfitting--on modular addition, enabling controlled analysis of their internal computations. We study this phenomenon across two levels of data regime. In a single-image regime, mechanistic dissection reveals that the model implements modular addition by composing periodic representations of individual operands. In a diverse-image regime with high intraclass variability, we find that the model leverages its iterative sampling process to partition the task into an arithmetic computation phase followed by a visual denoising phase, separated by a critical timestep threshold. Our work provides the mechanistic decomposition of algorithmic learning in diffusion models, revealing how these models bridge continuous pixel-space generation and discrete symbolic reasoning.