OCLGNov 8, 2023

Adaptive Mirror Descent Bilevel Optimization

arXiv:2311.04520v23 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses efficient optimization for machine learning and AI tasks involving bilevel structures, such as hyperparameter tuning, but is incremental as it builds on existing mirror descent and adaptive methods.

The paper tackles nonconvex bilevel optimization problems with nonconvex upper-level and lower-level problems satisfying the Polyak-Łojasiewicz condition, proposing adaptive mirror descent methods that achieve gradient complexities of O(ε^{-1}) for deterministic cases and O(ε^{-3/2}) for stochastic cases to find ε-stationary solutions.

In the paper, we propose a class of efficient adaptive bilevel methods based on mirror descent for nonconvex bilevel optimization, where its upper-level problem is nonconvex possibly with nonsmooth regularization, and its lower-level problem is also nonconvex while satisfies Polyak-Łojasiewicz (PL) condition. To solve these deterministic bilevel problems, we present an efficient adaptive projection-aid gradient (i.e., AdaPAG) method based on mirror descent, and prove that it obtains the best known gradient complexity of $O(ε^{-1})$ for finding an $ε$-stationary solution of nonconvex bilevel problems. To solve these stochastic bilevel problems, we propose an efficient adaptive stochastic projection-aid gradient (i.e., AdaVSPAG) methods based on mirror descent and variance-reduced techniques, and prove that it obtains the best known gradient complexity of $O(ε^{-3/2})$ for finding an $ε$-stationary solution. Since the PL condition relaxes the strongly convex, our algorithms can be used to nonconvex strongly-convex bilevel optimization. Theoretically, we provide a useful convergence analysis framework for our methods under some mild conditions, and prove that our methods have a fast convergence rate of $O(\frac{1}{T})$, where $T$ denotes the number of iterations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes