CVLGMay 25, 2025

Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

arXiv:2505.19122v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses training instability in diffusion models for generative AI, but it is incremental as it builds on prior magnitude-preserving techniques in U-nets.

The paper tackled the challenge of stabilizing training in Diffusion Transformers (DiT) by exploring magnitude preservation and introducing rotation modulation, resulting in a ~12.8% reduction in FID scores and a ~5.4% parameter reduction compared to AdaLN.

Denoising diffusion models exhibit remarkable generative capabilities, but remain challenging to train due to their inherent stochasticity, where high-variance gradient estimates lead to slow convergence. Previous works have shown that magnitude preservation helps with stabilizing training in the U-net architecture. This work explores whether this effect extends to the Diffusion Transformer (DiT) architecture. As such, we propose a magnitude-preserving design that stabilizes training without normalization layers. Motivated by the goal of maintaining activation magnitudes, we additionally introduce rotation modulation, which is a novel conditioning method using learned rotations instead of traditional scaling or shifting. Through empirical evaluations and ablation studies on small-scale models, we show that magnitude-preserving strategies significantly improve performance, notably reducing FID scores by $\sim$12.8%. Further, we show that rotation modulation combined with scaling is competitive with AdaLN, while requiring $\sim$5.4% fewer parameters. This work provides insights into conditioning strategies and magnitude control. We will publicly release the implementation of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes