Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles
This work addresses a practical bottleneck in bilevel optimization for machine learning and optimization researchers by eliminating the need for expensive Hessian-vector product oracles, offering incremental improvements in efficiency.
The paper tackles bilevel optimization with a strongly convex lower-level problem by developing a fully first-order method that achieves near-optimal oracle complexity of $ ilde{\mathcal{O}}(\epsilon^{-2})$, improving upon prior work with $ ilde{\mathcal{O}}(\epsilon^{-3})$, and extends to stochastic settings with complexities of $ ilde{\mathcal{O}}(\epsilon^{-4})$ and $ ilde{\mathcal{O}}(\epsilon^{-6})$ for different noise scenarios.
In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an $ε$-stationary point within ${\mathcal{O}}(ε^{-2})$ oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(ε^{-3})$. In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal $\tilde {\mathcal{O}}(ε^{-2})$ first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of $\tilde {\mathcal{O}}(ε^{-4})$ and $\tilde {\mathcal{O}}(ε^{-6})$ when the stochastic noises are only in the upper-level objective and in both level objectives, respectively. When the objectives have higher-order smoothness conditions, our deterministic method can escape saddle points by injecting noise, and can be accelerated to achieve a faster rate of $\tilde {\mathcal{O}}(ε^{-1.75})$ using Nesterov's momentum.