OC LGNov 3, 2022

Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization

Ali Kavis, Stratis Skoulakis, Kimon Antonakopoulos, Leello Tadesse Dadi, Volkan Cevher

arXiv:2211.01851v118.219 citationsh-index: 43

Originality Highly original

AI Analysis

This provides a parameter-free method for non-convex optimization, addressing a bottleneck in machine learning by eliminating the need for tuning constants like smoothness or gradient bounds.

The paper tackles the problem of minimizing non-convex finite-sum functions without requiring prior knowledge of problem-dependent parameters, achieving an oracle complexity of $ ilde{O}(n + \sqrt{n}/\epsilon^2)$ to compute an $\epsilon$-stationary point, matching the lower bound up to logarithmic factors.

We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $L$-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider combines an AdaGrad-inspired [Duchi et al., 2011, McMahan & Streeter, 2010], but a fairly distinct, adaptive step-size schedule with the recursive stochastic path integrated estimator proposed in [Fang et al., 2018]. To our knowledge, Adaspider is the first parameter-free non-convex variance-reduction method in the sense that it does not require the knowledge of problem-dependent parameters, such as smoothness constant $L$, target accuracy $ε$ or any bound on gradient norms. In doing so, we are able to compute an $ε$-stationary point with $\tilde{O}\left(n + \sqrt{n}/ε^2\right)$ oracle-calls, which matches the respective lower bound up to logarithmic factors.

View on arXiv PDF

Similar