AdaSub: Stochastic Optimization Using Second-Order Information in Low-Dimensional Subspaces
This addresses the efficiency problem for machine learning practitioners by providing a more practical second-order optimization method, though it appears incremental as it builds on existing second-order approaches.
The paper tackles the computational expense of second-order optimization methods by introducing AdaSub, which uses second-order information in adaptive low-dimensional subspaces, and demonstrates that it surpasses popular stochastic optimizers in time and iterations to reach a given accuracy.
We introduce AdaSub, a stochastic optimization algorithm that computes a search direction based on second-order information in a low-dimensional subspace that is defined adaptively based on available current and past information. Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses, making them impractical. To address this issue, our approach enables the management of computational expenses and algorithm efficiency by enabling the selection of the subspace dimension for the search. Our code is freely available on GitHub, and our preliminary numerical results demonstrate that AdaSub surpasses popular stochastic optimizers in terms of time and number of iterations required to reach a given accuracy.