Zeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction
This work addresses the challenge of computationally expensive gradient calculations in ML, offering a faster method for derivative-free optimization, though it appears incremental as it builds on existing asynchronous and variance reduction techniques.
The paper tackles the problem of zeroth-order optimization for large-scale machine learning by proposing an asynchronous doubly stochastic algorithm with variance reduction, improving the convergence rate from O(1/√T) to O(1/T) for finite sums of smooth functions.
Zeroth-order (derivative-free) optimization attracts a lot of attention in machine learning, because explicit gradient calculations may be computationally expensive or infeasible. To handle large scale problems both in volume and dimension, recently asynchronous doubly stochastic zeroth-order algorithms were proposed. The convergence rate of existing asynchronous doubly stochastic zeroth order algorithms is $O(\frac{1}{\sqrt{T}})$ (also for the sequential stochastic zeroth-order optimization algorithms). In this paper, we focus on the finite sums of smooth but not necessarily convex functions, and propose an asynchronous doubly stochastic zeroth-order optimization algorithm using the accelerated technology of variance reduction (AsyDSZOVR). Rigorous theoretical analysis show that the convergence rate can be improved from $O(\frac{1}{\sqrt{T}})$ the best result of existing algorithms to $O(\frac{1}{T})$. Also our theoretical results is an improvement to the ones of the sequential stochastic zeroth-order optimization algorithms.