On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization
This addresses the challenge of efficiently solving bilevel optimization problems without convexity assumptions, which is incremental as it extends existing methods to a more general nonconvex setting.
The paper tackles the problem of finding stationary points in nonconvex simple bilevel optimization, where both upper- and lower-level objectives are smooth but nonconvex, by introducing a suitable notion of stationarity and designing a first-order algorithm. The result is a complexity of O(max(ε_f^{-(3+p)/(1+p)}, ε_g^{-(3+p)/2})) for reaching (ε_f, ε_g)-stationary points, which is the first such complexity guarantee for general nonconvex simple bilevel problems.
In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper- and lower-level objectives are smooth but potentially nonconvex. Due to the absence of additional structural assumptions for the lower-level objective-such as convexity or the Polyak-Łojasiewicz (PL) condition-guaranteeing global optimality is generally intractable. Instead, we introduce a suitable notion of stationarity for this class of problems and aim to design a first-order algorithm that finds such stationary points in polynomial time. Intuitively, stationarity in this setting means the upper-level objective cannot be substantially improved locally without causing a larger deterioration in the lower-level objective. To this end, we show that a simple and implementable variant of the dynamic barrier gradient descent (DBGD) framework can effectively solve the considered nonconvex simple bilevel problems up to stationarity. Specifically, to reach an $(ε_f, ε_g)$-stationary point-where $ε_f$ and $ε_g$ denote the target stationarity accuracies for the upper- and lower-level objectives, respectively-the considered method achieves a complexity of $\mathcal{O}\left(\max\left(ε_f^{-\frac{3+p}{1+p}}, ε_g^{-\frac{3+p}{2}}\right)\right)$, where $p \geq 0$ is an arbitrary constant balancing the terms. To the best of our knowledge, this is the first complexity result for a discrete-time algorithm that guarantees joint stationarity for both levels in general nonconvex simple bilevel problems.