OCLGMLJan 5, 2022

Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates

arXiv:2201.01652v36 citationsHas Code
Originality Highly original
AI Analysis

This work provides theoretical guarantees for optimization methods under general non-convex dependent data settings, which is incremental but addresses a known bottleneck in stochastic optimization.

The paper tackled the problem of extending stochastic majorization-minimization algorithms to handle weakly convex or block multi-convex surrogates in non-convex constrained settings with non-i.i.d. data, achieving convergence rates of O((log n)^{1+ε}/n^{1/2}) for empirical loss and O((log n)^{1+ε}/n^{1/4}) for expected loss, with improvements under additional assumptions.

Stochastic majorization-minimization (SMM) is a class of stochastic optimization algorithms that proceed by sampling new data points and minimizing a recursive average of surrogate functions of an objective function. The surrogates are required to be strongly convex and convergence rate analysis for the general non-convex setting was not available. In this paper, we propose an extension of SMM where surrogates are allowed to be only weakly convex or block multi-convex, and the averaged surrogates are approximately minimized with proximal regularization or block-minimized within diminishing radii, respectively. For the general nonconvex constrained setting with non-i.i.d. data samples, we show that the first-order optimality gap of the proposed algorithm decays at the rate $O((\log n)^{1+ε}/n^{1/2})$ for the empirical loss and $O((\log n)^{1+ε}/n^{1/4})$ for the expected loss, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+ε}/n^{1/2})$. As a corollary, we obtain the first convergence rate bounds for various optimization methods under general nonconvex dependent data setting: Double-averaging projected gradient descent and its generalizations, proximal point empirical risk minimization, and online matrix/tensor decomposition algorithms. We also provide experimental validation of our results.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes