Zeroth-Order Nonconvex Nonsmooth Optimization with Heavy-Tailed Noise
For machine learning practitioners dealing with heavy-tailed noise in black-box optimization, this work provides a theoretically grounded zeroth-order method with near-optimal complexity, though it is an incremental extension of existing online-to-nonconvex conversion techniques.
This paper tackles nonconvex nonsmooth optimization with heavy-tailed noise in function evaluations, proposing a zeroth-order algorithm that achieves a Goldstein stationary point with oracle complexity matching best-known first-order methods. The algorithm's dimension dependence matches optimal convex results, and its accuracy dependence is consistent with first-order nonconvex nonsmooth algorithms.
This paper considers the nonconvex nonsmooth problem in which the objective function is Lipschitz continuous. We focus on the stochastic setting where the algorithm can access stochastic function value evaluations with heavy-tailed noise, which is prevalent in many popular machine learning applications. We propose a stochastic zeroth-order algorithm that refines the framework of online-to-nonconvex conversion by clipping the two-point gradient estimator. The theoretical analysis shows that our algorithm can find a $(δ, ε)$-Goldstein stationary point with zeroth-order oracle complexity of ${\mathcal O}(d^{\frac{p}{2(p-1)}}δ^{-1}ε^{-\frac{2p-1}{p-1}})$, where $d$ is the problem dimension and $p\in(1,2]$ is the order of bounded moments. Note that our dependence on dimension $d$ matches the best-known results of stochastic zeroth-order optimization for finding the sub-optimal solution of a stochastic convex nonsmooth problem. In addition, our dependence on accuracy parameters $δ$ and $ε$ is consistent with that of the best-known stochastic first-order algorithms for stochastic nonconvex nonsmooth problems. Finally, we conduct numerical experiments to demonstrate the effectiveness of the proposed method.