Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization
This work addresses the challenge of efficient optimization in machine learning and AI for scenarios where gradients are unavailable or functions are nonsmooth and nonconvex, representing an incremental improvement over existing methods.
The paper tackles the problem of optimizing nonsmooth nonconvex stochastic functions by proposing a new gradient-free algorithm that reduces the stochastic zeroth-order oracle complexity from O(L^4 d^{3/2} ε^{-4} + ΔL^3 d^{3/2} δ^{-1} ε^{-4}) to O(L^3 d^{3/2} ε^{-3} + ΔL^2 d^{3/2} δ^{-1} ε^{-3}) for finding a (δ,ε)-Goldstein stationary point.
We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}_ξ [F(x; ξ)]$, where the component $F(x;ξ)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} ε^{-4} + ΔL^3 d^{3/2} δ^{-1} ε^{-4})$ stochastic zeroth-order oracle complexity to find a $(δ,ε)$-Goldstein stationary point of objective function, where $Δ= f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} ε^{-3}+ ΔL^2 d^{3/2} δ^{-1} ε^{-3})$.