OC LG MLJan 5, 2022

Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates

arXiv:2201.01652v39.36 citationsHas Code

Originality Highly original

AI Analysis

This work provides theoretical guarantees for optimization methods under general non-convex dependent data settings, which is incremental but addresses a known bottleneck in stochastic optimization.

The paper tackled the problem of extending stochastic majorization-minimization algorithms to handle weakly convex or block multi-convex surrogates in non-convex constrained settings with non-i.i.d. data, achieving convergence rates of O((log n)^{1+ε}/n^{1/2}) for empirical loss and O((log n)^{1+ε}/n^{1/4}) for expected loss, with improvements under additional assumptions.

Stochastic majorization-minimization (SMM) is a class of stochastic optimization algorithms that proceed by sampling new data points and minimizing a recursive average of surrogate functions of an objective function. The surrogates are required to be strongly convex and convergence rate analysis for the general non-convex setting was not available. In this paper, we propose an extension of SMM where surrogates are allowed to be only weakly convex or block multi-convex, and the averaged surrogates are approximately minimized with proximal regularization or block-minimized within diminishing radii, respectively. For the general nonconvex constrained setting with non-i.i.d. data samples, we show that the first-order optimality gap of the proposed algorithm decays at the rate $O((\log n)^{1+ε}/n^{1/2})$ for the empirical loss and $O((\log n)^{1+ε}/n^{1/4})$ for the expected loss, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+ε}/n^{1/2})$. As a corollary, we obtain the first convergence rate bounds for various optimization methods under general nonconvex dependent data setting: Double-averaging projected gradient descent and its generalizations, proximal point empirical risk minimization, and online matrix/tensor decomposition algorithms. We also provide experimental validation of our results.

View on arXiv PDF Code

Similar