OCLGOct 31, 2022

Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

arXiv:2210.16997v62 citationsh-index: 4
Originality Incremental advance
AI Analysis

This provides theoretical guarantees for optimization in settings where gradient information is unavailable, which is incremental but relevant for black-box optimization problems in machine learning and engineering.

The paper tackles the problem of analyzing convergence rates for Stochastic Zeroth-order Gradient Descent (SZGD) algorithms applied to Łojasiewicz functions, showing that the function value convergence can be faster than the iterate convergence, with results applicable to both smooth and nonsmooth objectives.

We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. The SZGD algorithm iterates as \begin{align*} \mathbf{x}_{t+1} = \mathbf{x}_t - η_t \widehat{\nabla} f (\mathbf{x}_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where $f$ is the objective function that satisfies the Łojasiewicz inequality with Łojasiewicz exponent $θ$, $η_t$ is the step size (learning rate), and $ \widehat{\nabla} f (\mathbf{x}_t) $ is the approximate gradient estimated using zeroth-order information only. Our results show that $ \{ f (\mathbf{x}_t) - f (\mathbf{x}_\infty) \}_{t \in \mathbb{N} } $ can converge faster than $ \{ \| \mathbf{x}_t - \mathbf{x}_\infty \| \}_{t \in \mathbb{N} }$, regardless of whether the objective $f$ is smooth or nonsmooth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes