OC LGOct 31, 2022

Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

arXiv:2210.16997v65.62 citationsh-index: 4

Originality Incremental advance

AI Analysis

This provides theoretical guarantees for optimization in settings where gradient information is unavailable, which is incremental but relevant for black-box optimization problems in machine learning and engineering.

The paper tackles the problem of analyzing convergence rates for Stochastic Zeroth-order Gradient Descent (SZGD) algorithms applied to Łojasiewicz functions, showing that the function value convergence can be faster than the iterate convergence, with results applicable to both smooth and nonsmooth objectives.

We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. The SZGD algorithm iterates as \begin{align*} \mathbf{x}_{t+1} = \mathbf{x}_t - η_t \widehat{\nabla} f (\mathbf{x}_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where $f$ is the objective function that satisfies the Łojasiewicz inequality with Łojasiewicz exponent $θ$, $η_t$ is the step size (learning rate), and $ \widehat{\nabla} f (\mathbf{x}_t) $ is the approximate gradient estimated using zeroth-order information only. Our results show that $ \{ f (\mathbf{x}_t) - f (\mathbf{x}_\infty) \}_{t \in \mathbb{N} } $ can converge faster than $ \{ \| \mathbf{x}_t - \mathbf{x}_\infty \| \}_{t \in \mathbb{N} }$, regardless of whether the objective $f$ is smooth or nonsmooth.

View on arXiv PDF

Similar