CV LGMay 25, 2025

Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

Eric Tillman Bill, Cristian Perez Jensen, Sotiris Anagnostidis, Dimitri von Rütte

arXiv:2505.19122v13.6h-index: 2

Originality Incremental advance

AI Analysis

This work addresses training instability in diffusion models for generative AI, but it is incremental as it builds on prior magnitude-preserving techniques in U-nets.

The paper tackled the challenge of stabilizing training in Diffusion Transformers (DiT) by exploring magnitude preservation and introducing rotation modulation, resulting in a ~12.8% reduction in FID scores and a ~5.4% parameter reduction compared to AdaLN.

Denoising diffusion models exhibit remarkable generative capabilities, but remain challenging to train due to their inherent stochasticity, where high-variance gradient estimates lead to slow convergence. Previous works have shown that magnitude preservation helps with stabilizing training in the U-net architecture. This work explores whether this effect extends to the Diffusion Transformer (DiT) architecture. As such, we propose a magnitude-preserving design that stabilizes training without normalization layers. Motivated by the goal of maintaining activation magnitudes, we additionally introduce rotation modulation, which is a novel conditioning method using learned rotations instead of traditional scaling or shifting. Through empirical evaluations and ablation studies on small-scale models, we show that magnitude-preserving strategies significantly improve performance, notably reducing FID scores by $\sim$12.8%. Further, we show that rotation modulation combined with scaling is competitive with AdaLN, while requiring $\sim$5.4% fewer parameters. This work provides insights into conditioning strategies and magnitude control. We will publicly release the implementation of our method.

View on arXiv PDF

Similar