OCITMLFeb 28, 2018

On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions

arXiv:1802.10418v14 citations
Originality Highly original
AI Analysis

This provides a theoretical guarantee for alternating-type algorithms in nonconvex optimization, addressing a known bottleneck for researchers in optimization and machine learning.

The paper tackles the problem of avoiding saddle points and local maxima in nonconvex optimization by proposing a perturbed alternating gradient descent (PA-GD) algorithm, which converges to second-order stationary solutions with high probability in O(polylog(d)/ε^{7/3}) iterations.

The alternating gradient descent (AGD) is a simple but popular algorithm which has been applied to problems in optimization, machine learning, data ming, and signal processing, etc. The algorithm updates two blocks of variables in an alternating manner, in which a gradient step is taken on one block, while keeping the remaining block fixed. When the objective function is nonconvex, it is well-known the AGD converges to the first-order stationary solution with a global sublinear rate. In this paper, we show that a variant of AGD-type algorithms will not be trapped by "bad" stationary solutions such as saddle points and local maximum points. In particular, we consider a smooth unconstrained optimization problem, and propose a perturbed AGD (PA-GD) which converges (with high probability) to the set of second-order stationary solutions (SS2) with a global sublinear rate. To the best of our knowledge, this is the first alternating type algorithm which takes $\mathcal{O}(\text{polylog}(d)/ε^{7/3})$ iterations to achieve SS2 with high probability [where polylog$(d)$ is polynomial of the logarithm of dimension $d$ of the problem].

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes