OCLGJan 16, 2023

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

arXiv:2301.06428v328 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient optimization in machine learning and AI for scenarios where gradients are unavailable or functions are nonsmooth and nonconvex, representing an incremental improvement over existing methods.

The paper tackles the problem of optimizing nonsmooth nonconvex stochastic functions by proposing a new gradient-free algorithm that reduces the stochastic zeroth-order oracle complexity from O(L^4 d^{3/2} ε^{-4} + ΔL^3 d^{3/2} δ^{-1} ε^{-4}) to O(L^3 d^{3/2} ε^{-3} + ΔL^2 d^{3/2} δ^{-1} ε^{-3}) for finding a (δ,ε)-Goldstein stationary point.

We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}_ξ [F(x; ξ)]$, where the component $F(x;ξ)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} ε^{-4} + ΔL^3 d^{3/2} δ^{-1} ε^{-4})$ stochastic zeroth-order oracle complexity to find a $(δ,ε)$-Goldstein stationary point of objective function, where $Δ= f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} ε^{-3}+ ΔL^2 d^{3/2} δ^{-1} ε^{-3})$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes