OC LG MLDec 18, 2020

On the Efficient Implementation of the Matrix Exponentiated Gradient Algorithm for Low-Rank Matrix Optimization

arXiv:2012.10469v27.06 citations

Originality Incremental advance

AI Analysis

This work significantly improves the scalability of the Matrix Exponentiated Gradient algorithm, making it practical for high-dimensional low-rank matrix optimization problems in machine learning, signal processing, and statistics.

This paper addresses the computational bottleneck of the Matrix Exponentiated Gradient (MEG) algorithm, which requires a full SVD at each iteration, by proposing efficient implementations for low-rank matrix optimization. The new methods only require a single low-rank SVD computation per iteration, achieving similar convergence rates to full-SVD MEG under a strict complementarity condition.

Convex optimization over the spectrahedron, i.e., the set of all real $n\times n$ positive semidefinite matrices with unit trace, has important applications in machine learning, signal processing and statistics, mainly as a convex relaxation for optimization problems with low-rank matrices. It is also one of the most prominent examples in the theory of first-order methods for convex optimization in which non-Euclidean methods can be significantly preferable to their Euclidean counterparts. In particular, the desirable choice is the Matrix Exponentiated Gradient (MEG) method which is based on the Bregman distance induced by the (negative) von Neumann entropy. Unfortunately, implementing MEG requires a full SVD computation on each iteration, which is not scalable to high-dimensional problems. In this work we propose an efficient implementations of MEG, both with deterministic and stochastic gradients, which are tailored for optimization with low-rank matrices, and only use a single low-rank SVD computation on each iteration. We also provide efficiently-computable certificates for the correct convergence of our methods. Mainly, we prove that under a strict complementarity condition, the suggested methods converge from a ``warm-start" initialization with similar rates to their full-SVD-based counterparts. Finally, we bring empirical experiments which both support our theoretical findings and demonstrate the practical appeal of our methods.

View on arXiv PDF

Similar