Oracle Complexity of Second-Order Methods for Finite-Sum Problems
This addresses a fundamental problem in machine learning optimization for researchers and practitioners, showing that second-order methods may not offer efficiency gains as previously hoped, which is incremental but clarifies theoretical limits.
The paper tackles the question of whether second-order methods can solve finite-sum optimization problems more efficiently than first-order methods, finding that, in terms of worst-case guarantees, the answer is negative, though it suggests potential workarounds with additional assumptions.
Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations. Recently, there has been growing interest in \emph{second-order} methods, which rely on both gradients and Hessians. In principle, second-order methods can require much fewer iterations than first-order methods, and hold the promise for more efficient algorithms. Although computing and manipulating Hessians is prohibitive for high-dimensional problems in general, the Hessians of individual functions in finite-sum problems can often be efficiently computed, e.g. because they possess a low-rank structure. Can second-order information indeed be used to solve such problems more efficiently? In this paper, we provide evidence that the answer -- perhaps surprisingly -- is negative, at least in terms of worst-case guarantees. However, we also discuss what additional assumptions and algorithmic approaches might potentially circumvent this negative result.