ML LGFeb 10, 2025

Spectral-factorized Positive-definite Curvature Learning for NN Training

Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Roger B. Grosse

U of Toronto

arXiv:2502.06268v34.5h-index: 15

Originality Incremental advance

AI Analysis

This addresses computational bottlenecks in neural network training for researchers and practitioners, though it appears incremental as it builds on existing curvature-based methods.

The paper tackles the computational inefficiency and limited applicability of non-diagonal training methods like Shampoo by proposing a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling efficient matrix root computation and generic curvature learning, with demonstrated efficacy in positive-definite matrix optimization, covariance adaptation, and neural net training.

Many training methods, such as Adam(W) and Shampoo, learn a positive-definite curvature matrix and apply an inverse root before preconditioning. Recently, non-diagonal training methods, such as Shampoo, have gained significant attention; however, they remain computationally inefficient and are limited to specific types of curvature information due to the costly matrix root computation via matrix decomposition. To address this, we propose a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling the efficient application of arbitrary matrix roots and generic curvature learning. We demonstrate the efficacy and versatility of our approach in positive-definite matrix optimization and covariance adaptation for gradient-free optimization, as well as its efficiency in curvature learning for neural net training.

View on arXiv PDF

Similar