Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling
This addresses optimization problems where gradient information is unavailable, such as in black-box settings, offering a more efficient solution for high-dimensional tasks.
The paper tackles high-dimensional derivative-free optimization by proposing ZORO, a method that leverages approximately sparse gradients to reduce the number of function evaluations needed for convergence, achieving a convergence rate that depends only logarithmically on dimension and outperforming existing methods in experiments.
We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or black-box optimization. We propose a new $\textbf{Z}$eroth-$\textbf{O}$rder $\textbf{R}$egularized $\textbf{O}$ptimization method, dubbed ZORO. When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function. We achieve this with an adaptive, randomized gradient estimator, followed by an inexact proximal-gradient scheme. Under a novel approximately sparse gradient assumption and various different convex settings, we show the (theoretical and empirical) convergence rate of ZORO is only logarithmically dependent on the problem dimension. Numerical experiments show that ZORO outperforms the existing methods with similar assumptions, on both synthetic and real datasets.