ML LGJan 30

Spectral Gradient Descent Mitigates Anisotropy-Driven Misalignment: A Case Study in Phase Retrieval

Guillaume Braun, Han Bao, Wei Huang, Masaaki Imaizumi

arXiv:2601.22652v13.23 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses a specific issue in optimization for machine learning, particularly in phase retrieval and neural network training, but is incremental as it builds on existing spectral gradient methods.

The paper tackled the problem of gradient descent misalignment in anisotropic settings, showing that spectral gradient descent prevents variance-induced misalignment and accelerates noise contraction, with numerical experiments confirming the theory.

Spectral gradient methods, such as the Muon optimizer, modify gradient updates by preserving directional information while discarding scale, and have shown strong empirical performance in deep learning. We investigate the mechanisms underlying these gains through a dynamical analysis of a nonlinear phase retrieval model with anisotropic Gaussian inputs, equivalent to training a two-layer neural network with the quadratic activation and fixed second-layer weights. Focusing on a spiked covariance setting where the dominant variance direction is orthogonal to the signal, we show that gradient descent (GD) suffers from a variance-induced misalignment: during the early escaping stage, the high-variance but uninformative spike direction is multiplicatively amplified, degrading alignment with the true signal under strong anisotropy. In contrast, spectral gradient descent (SpecGD) removes this spike amplification effect, leading to stable alignment and accelerated noise contraction. Numerical experiments confirm the theory and show that these phenomena persist under broader anisotropic covariances.

View on arXiv PDF

Similar