OC LGJul 30, 2025

On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization

Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

arXiv:2507.23155v14.11 citationsh-index: 11

Originality Highly original

AI Analysis

This addresses the challenge of efficiently solving bilevel optimization problems without convexity assumptions, which is incremental as it extends existing methods to a more general nonconvex setting.

The paper tackles the problem of finding stationary points in nonconvex simple bilevel optimization, where both upper- and lower-level objectives are smooth but nonconvex, by introducing a suitable notion of stationarity and designing a first-order algorithm. The result is a complexity of O(max(ε_f^{-(3+p)/(1+p)}, ε_g^{-(3+p)/2})) for reaching (ε_f, ε_g)-stationary points, which is the first such complexity guarantee for general nonconvex simple bilevel problems.

In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper- and lower-level objectives are smooth but potentially nonconvex. Due to the absence of additional structural assumptions for the lower-level objective-such as convexity or the Polyak-Łojasiewicz (PL) condition-guaranteeing global optimality is generally intractable. Instead, we introduce a suitable notion of stationarity for this class of problems and aim to design a first-order algorithm that finds such stationary points in polynomial time. Intuitively, stationarity in this setting means the upper-level objective cannot be substantially improved locally without causing a larger deterioration in the lower-level objective. To this end, we show that a simple and implementable variant of the dynamic barrier gradient descent (DBGD) framework can effectively solve the considered nonconvex simple bilevel problems up to stationarity. Specifically, to reach an $(ε_f, ε_g)$-stationary point-where $ε_f$ and $ε_g$ denote the target stationarity accuracies for the upper- and lower-level objectives, respectively-the considered method achieves a complexity of $\mathcal{O}\left(\max\left(ε_f^{-\frac{3+p}{1+p}}, ε_g^{-\frac{3+p}{2}}\right)\right)$, where $p \geq 0$ is an arbitrary constant balancing the terms. To the best of our knowledge, this is the first complexity result for a discrete-time algorithm that guarantees joint stationarity for both levels in general nonconvex simple bilevel problems.

View on arXiv PDF

Similar