On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

arXiv:2301.0071298.540 citationsh-index: 20

AI Analysis

For researchers in bilevel optimization, this work clarifies the boundary between tractable and intractable problems and provides optimal algorithms for a practically relevant class of nonconvex-nonconvex problems.

This paper proves that finding stationary points of the hyper-objective in nonconvex-convex bilevel optimization is intractable for zero-respecting algorithms, but for nonconvex-nonconvex problems where the lower-level function satisfies the PL condition, a simple first-order algorithm achieves optimal or near-optimal complexity bounds: Õ(ε^{-2}) (deterministic), Õ(ε^{-4}) (partially stochastic), and Õ(ε^{-6}) (fully stochastic).

Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where the lower-level functions lack strong convexity. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-Łojasiewicz (PL) condition. We show a simple first-order algorithm can achieve better complexity bounds of $\tilde{\mathcal{O}}(ε^{-2})$, $\tilde{\mathcal{O}}(ε^{-4})$ and $\tilde{\mathcal{O}}(ε^{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively. The complexities in the first two cases are optimal up to logarithmic factors.

View on arXiv PDF

Similar