Zeroth and First Order Stochastic Frank-Wolfe Algorithms for Constrained Optimization
This work addresses optimization problems in semidefinite programming and NP-hard approximations, offering faster and more efficient algorithms for researchers and practitioners in machine learning and optimization.
The paper tackles constrained stochastic convex optimization with two constraint sets, where projection onto one set is difficult, by developing stochastic Frank-Wolfe algorithms with momentum-based gradient tracking to achieve fast convergence rates comparable to best-known rates for simpler problems, and introduces zeroth-order and trimmed variants that improve state-of-the-art results and reduce computational calls.
This paper considers stochastic convex optimization problems with two sets of constraints: (a) deterministic constraints on the domain of the optimization variable, which are difficult to project onto; and (b) deterministic or stochastic constraints that admit efficient projection. Problems of this form arise frequently in the context of semidefinite programming as well as when various NP-hard problems are solved approximately via semidefinite relaxation. Since projection onto the first set of constraints is difficult, it becomes necessary to explore projection-free algorithms, such as the stochastic Frank-Wolfe (FW) algorithm. On the other hand, the second set of constraints cannot be handled in the same way, and must be incorporated as an indicator function within the objective function, thereby complicating the application of FW methods. Similar problems have been studied before; however, they suffer from slow convergence rates. This work, equipped with momentum based gradient tracking technique, guarantees fast convergence rates on par with the best-known rates for problems without the second set of constraints. Zeroth-order variants of the proposed algorithms are also developed and again improve upon the state-of-the-art rate results. We further propose the novel trimmed FW variants that enjoy the same convergence rates as their classical counterparts, but are empirically shown to require significantly fewer calls to the linear minimization oracle speeding up the overall algorithm. The efficacy of the proposed algorithms is tested on relevant applications of sparse matrix estimation, clustering via semidefinite relaxation, and uniform sparsest cut problem.