OCLGMLAug 15, 2023

Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem

arXiv:2308.07536v117 citationsh-index: 40
Originality Highly original
AI Analysis

This work addresses optimization challenges in machine learning and operations research by providing more efficient algorithms for stochastic bilevel problems, though it is incremental as it builds on existing methods with specific improvements.

The paper tackles stochastic simple bilevel optimization problems by introducing novel projection-free methods that use stochastic cutting planes and variance reduction, achieving improved complexity bounds: for convex upper-level, it reduces stochastic oracle queries from O(max{1/ε_f^4, 1/ε_g^4}) to Õ(max{1/ε_f^2, 1/ε_g^2}), and for non-convex upper-level, it requires Õ(max{1/ε_f^3, 1/ε_g^3}) queries to find stationary points.

In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via a stochastic cutting plane, and then run a conditional gradient update with variance reduction techniques to control the error induced by using stochastic gradients. For the case that the upper-level function is convex, our method requires $\tilde{\mathcal{O}}(\max\{1/ε_f^{2},1/ε_g^{2}\}) $ stochastic oracle queries to obtain a solution that is $ε_f$-optimal for the upper-level and $ε_g$-optimal for the lower-level. This guarantee improves the previous best-known complexity of $\mathcal{O}(\max\{1/ε_f^{4},1/ε_g^{4}\})$. Moreover, for the case that the upper-level function is non-convex, our method requires at most $\tilde{\mathcal{O}}(\max\{1/ε_f^{3},1/ε_g^{3}\}) $ stochastic oracle queries to find an $(ε_f, ε_g)$-stationary point. In the finite-sum setting, we show that the number of stochastic oracle calls required by our method are $\tilde{\mathcal{O}}(\sqrt{n}/ε)$ and $\tilde{\mathcal{O}}(\sqrt{n}/ε^{2})$ for the convex and non-convex settings, respectively, where $ε=\min \{ε_f,ε_g\}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes