On the convergence of mirror descent beyond stochastic convex programming
This addresses the problem of non-convex stochastic optimization for researchers and practitioners by providing weaker convergence conditions than convexity, though it is incremental in extending existing mirror descent theory.
The paper tackles the convergence of mirror descent in non-convex stochastic optimization by introducing variational coherence, showing that the last iterate converges with probability 1 under this condition, and revealing that in problems with sharp minima, it reaches a minimum in finite steps even with gradient noise.
In this paper, we examine the convergence of mirror descent in a class of stochastic optimization problems that are not necessarily convex (or even quasi-convex), and which we call variationally coherent. Since the standard technique of "ergodic averaging" offers no tangible benefits beyond convex programming, we focus directly on the algorithm's last generated sample (its "last iterate"), and we show that it converges with probabiility $1$ if the underlying problem is coherent. We further consider a localized version of variational coherence which ensures local convergence of stochastic mirror descent (SMD) with high probability. These results contribute to the landscape of non-convex stochastic optimization by showing that (quasi-)convexity is not essential for convergence to a global minimum: rather, variational coherence, a much weaker requirement, suffices. Finally, building on the above, we reveal an interesting insight regarding the convergence speed of SMD: in problems with sharp minima (such as generic linear programs or concave minimization problems), SMD reaches a minimum point in a finite number of steps (a.s.), even in the presence of persistent gradient noise. This result is to be contrasted with existing black-box convergence rate estimates that are only asymptotic.