Position: Curvature Matrices Should Be Democratized via Linear Operators
This addresses the complexity of curvature computations for researchers and practitioners in machine learning, though it is incremental as it builds on existing linear operator concepts.
The paper tackles the challenge of computing curvature matrices like the Hessian in machine learning by proposing linear operators as a general, scalable, and user-friendly abstraction, and demonstrates this with the curvlinops library, which simplifies applications and scales to large neural networks.
Structured large matrices are prevalent in machine learning. A particularly important class is curvature matrices like the Hessian, which are central to understanding the loss landscape of neural nets (NNs), and enable second-order optimization, uncertainty quantification, model pruning, data attribution, and more. However, curvature computations can be challenging due to the complexity of automatic differentiation, and the variety and structural assumptions of curvature proxies, like sparsity and Kronecker factorization. In this position paper, we argue that linear operators -- an interface for performing matrix-vector products -- provide a general, scalable, and user-friendly abstraction to handle curvature matrices. To support this position, we developed $\textit{curvlinops}$, a library that provides curvature matrices through a unified linear operator interface. We demonstrate with $\textit{curvlinops}$ how this interface can hide complexity, simplify applications, be extensible and interoperable with other libraries, and scale to large NNs.