OCLGMay 20, 2016

Faster Projection-free Convex Optimization over the Spectrahedron

arXiv:1605.06203v135 citations
Originality Highly original
AI Analysis

This work addresses a bottleneck in large-scale optimization for applications in machine learning and signal processing, offering a more efficient algorithm with provable improvements.

The paper tackles the problem of minimizing convex functions over the spectrahedron, which is computationally expensive due to matrix decompositions, by proposing a modified conditional gradient method that achieves faster convergence rates, specifically O(min{β/t, (β√rank(X*)/(α^{1/4}t))^{4/3}, (β/(√αλ_min(X*)t))^2}) for strongly convex and smooth functions.

Minimizing a convex function over the spectrahedron, i.e., the set of all positive semidefinite matrices with unit trace, is an important optimization task with many applications in optimization, machine learning, and signal processing. It is also notoriously difficult to solve in large-scale since standard techniques require expensive matrix decompositions. An alternative, is the conditional gradient method (aka Frank-Wolfe algorithm) that regained much interest in recent years, mostly due to its application to this specific setting. The key benefit of the CG method is that it avoids expensive matrix decompositions all together, and simply requires a single eigenvector computation per iteration, which is much more efficient. On the downside, the CG method, in general, converges with an inferior rate. The error for minimizing a $β$-smooth function after $t$ iterations scales like $β/t$. This convergence rate does not improve even if the function is also strongly convex. In this work we present a modification of the CG method tailored for convex optimization over the spectrahedron. The per-iteration complexity of the method is essentially identical to that of the standard CG method: only a single eigenvecor computation is required. For minimizing an $α$-strongly convex and $β$-smooth function, the expected approximation error of the method after $t$ iterations is: $$O\left({\min\{\frac{β}{t} ,\left({\frac{β\sqrt{\textrm{rank}(\textbf{X}^*)}}{α^{1/4}t}}\right)^{4/3}, \left({\fracβ{\sqrtαλ_{\min}(\textbf{X}^*)t}}\right)^{2}\}}\right) ,$$ where $\textbf{X}^*$ is the optimal solution. To the best of our knowledge, this is the first result that attains provably faster convergence rates for a CG variant for optimization over the spectrahedron. We also present encouraging preliminary empirical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes