Block majorization-minimization with diminishing radius for constrained nonsmooth nonconvex optimization
This work provides improved theoretical convergence guarantees for a broad class of block coordinate optimization algorithms, which is important for researchers and practitioners working with large-scale nonconvex problems.
This paper analyzes Block Majorization-Minimization (BMM) for constrained nonsmooth nonconvex optimization, showing that with strongly convex and smooth surrogates, it achieves an Õ((1+L_g+ρ⁻¹)ε⁻²) iteration complexity. By combining BMM with trust-region methods using a diminishing radius, the complexity improves to Õ((1+L_g)ε⁻²), removing the dependency on the inverse strong convexity parameter ρ⁻¹.
Block majorization-minimization (BMM) is a simple iterative algorithm for constrained nonconvex optimization that sequentially minimizes majorizing surrogates of the objective function in each block while the others are held fixed. BMM entails a large class of optimization algorithms such as block coordinate descent and its proximal-point variant, expectation-minimization, and block projected gradient descent. We first establish that for general constrained nonsmooth nonconvex optimization, BMM with $ρ$-strongly convex and $L_g$-smooth surrogates can produce an $ε$-approximate first-order optimal point within $\widetilde{O}((1+L_g+ρ^{-1})ε^{-2})$ iterations and asymptotically converges to the set of first-order optimal points. Next, we show that BMM combined with trust-region methods with diminishing radius has an improved complexity of $\widetilde{O}((1+L_g) ε^{-2})$, independent of the inverse strong convexity parameter $ρ^{-1}$, allowing improved theoretical and practical performance with `flat' surrogates. Our results hold robustly even when the convex sub-problems are solved as long as the optimality gaps are summable. Central to our analysis is a novel continuous first-order optimality measure, by which we bound the worst-case sub-optimality in each iteration by the first-order improvement the algorithm makes. We apply our general framework to obtain new results on various algorithms such as the celebrated multiplicative update algorithm for nonnegative matrix factorization by Lee and Seung, regularized nonnegative tensor decomposition, and the classical block projected gradient descent algorithm. Lastly, we numerically demonstrate that the additional use of diminishing radius can improve the convergence rate of BMM in many instances.