Complexity of Block Coordinate Descent with Proximal Regularization and Applications to Wasserstein CP-dictionary Learning
This provides theoretical guarantees for a classical optimization method, with incremental improvements in complexity analysis and application to a specific domain of probability distribution approximation.
The paper establishes a worst-case complexity bound of O(1/epsilon) iterations for the block coordinate descent with proximal regularization algorithm to reach an epsilon-stationary point for general nonconvex smooth objectives, and applies it to develop a provable and efficient algorithm for Wasserstein CP-dictionary learning.
We consider the block coordinate descent methods of Gauss-Seidel type with proximal regularization (BCD-PR), which is a classical method of minimizing general nonconvex objectives under constraints that has a wide range of practical applications. We theoretically establish the worst-case complexity bound for this algorithm. Namely, we show that for general nonconvex smooth objectives with block-wise constraints, the classical BCD-PR algorithm converges to an epsilon-stationary point within O(1/epsilon) iterations. Under a mild condition, this result still holds even if the algorithm is executed inexactly in each step. As an application, we propose a provable and efficient algorithm for `Wasserstein CP-dictionary learning', which seeks a set of elementary probability distributions that can well-approximate a given set of d-dimensional joint probability distributions. Our algorithm is a version of BCD-PR that operates in the dual space, where the primal problem is regularized both entropically and proximally.