Optimization Benchmark for Diffusion Models on Dynamical Systems
This work provides a domain-specific benchmark for optimizing diffusion models, which is incremental as it applies existing techniques to a new context.
The authors benchmarked recent optimization algorithms for training diffusion models on dynamical systems, finding that Muon and SOAP are highly efficient alternatives to AdamW with an 18% lower final loss.
The training of diffusion models is often absent in the evaluation of new optimization techniques. In this work, we benchmark recent optimization algorithms for training a diffusion model for denoising flow trajectories. We observe that Muon and SOAP are highly efficient alternatives to AdamW (18% lower final loss). We also revisit several recent phenomena related to the training of models for text or image applications in the context of diffusion model training. This includes the impact of the learning-rate schedule on the training dynamics, and the performance gap between Adam and SGD.