MLLGFeb 10, 2025

Spectral-factorized Positive-definite Curvature Learning for NN Training

U of Toronto
arXiv:2502.06268v3h-index: 15
Originality Incremental advance
AI Analysis

This addresses computational bottlenecks in neural network training for researchers and practitioners, though it appears incremental as it builds on existing curvature-based methods.

The paper tackles the computational inefficiency and limited applicability of non-diagonal training methods like Shampoo by proposing a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling efficient matrix root computation and generic curvature learning, with demonstrated efficacy in positive-definite matrix optimization, covariance adaptation, and neural net training.

Many training methods, such as Adam(W) and Shampoo, learn a positive-definite curvature matrix and apply an inverse root before preconditioning. Recently, non-diagonal training methods, such as Shampoo, have gained significant attention; however, they remain computationally inefficient and are limited to specific types of curvature information due to the costly matrix root computation via matrix decomposition. To address this, we propose a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling the efficient application of arbitrary matrix roots and generic curvature learning. We demonstrate the efficacy and versatility of our approach in positive-definite matrix optimization and covariance adaptation for gradient-free optimization, as well as its efficiency in curvature learning for neural net training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes