OCLGNov 11, 2024

Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems

arXiv:2411.07102v2h-index: 5
Originality Incremental advance
AI Analysis

This work addresses optimization efficiency for deep learning practitioners, offering incremental improvements by combining existing techniques with novel algorithmic tweaks.

The paper tackled the challenge of integrating momentum terms with stochastic line search methods for finite-sum optimization in deep learning, proposing a framework that uses mini-batch persistency and conjugate-gradient rules to achieve state-of-the-art results in large-scale convex and nonconvex training problems.

In this work, we address unconstrained finite-sum optimization problems, with particular focus on instances originating in large scale deep learning scenarios. Our main interest lies in the exploration of the relationship between recent line search approaches for stochastic optimization in the overparametrized regime and momentum directions. First, we point out that combining these two elements with computational benefits is not straightforward. To this aim, we propose a solution based on mini-batch persistency. We then introduce an algorithmic framework that exploits a mix of data persistency, conjugate-gradient type rules for the definition of the momentum parameter and stochastic line searches. The resulting algorithm provably possesses convergence properties under suitable assumptions and is empirically shown to outperform other popular methods from the literature, obtaining state-of-the-art results in both convex and nonconvex large scale training problems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes