LGSep 26, 2025

Understanding SOAP from the Perspective of Gradient Whitening

arXiv:2509.22938v12 citations
Originality Synthesis-oriented
AI Analysis

This work provides theoretical and empirical insights into optimization algorithms for neural network training, but it is incremental as it confirms known equivalence without new performance gains.

The paper analyzed Adam, Shampoo, and SOAP optimization algorithms from a gradient whitening perspective, showing that SOAP has similar convergence and final loss as Shampoo in language modeling and image colorization tasks, with no significant advantage over Adam or Shampoo.

Shampoo with Adam in the Preconditioner's eigenbasis (SOAP) has recently emerged as a promising optimization algorithm for neural network training, achieving superior training efficiency over both Adam and Shampoo in language modeling tasks. In this work, we analyze Adam, Shampoo, and SOAP from the perspective of gradient whitening, interpreting their preconditioners as approximations to the whitening matrix, which captures second-order curvature information. We further establish a theoretical equivalence between idealized versions of SOAP and Shampoo under the Kronecker product assumption. To empirically evaluate these insights, we reproduce the language modeling experiments using nanoGPT and grayscale image colorization. Our results show that SOAP exhibits similar convergence rate as Shampoo, and no significant advantage over both Adam and Shampoo in the final loss achieved, which aligns with their equivalence in theory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes