Provably Faster Algorithms for Bilevel Optimization
This work provides a significant speedup for bilevel optimization, which is crucial for applications like hyperparameter tuning and meta-learning, though it is incremental in improving upon momentum-based methods.
The paper tackles the computational inefficiency of bilevel optimization in machine learning by proposing two new algorithms that achieve a complexity of 𝒪̃(ε^{-1.5}), outperforming existing methods with a provable order-of-magnitude improvement, as validated by experiments in hyperparameter applications.
Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $\mathcal{\widetilde O}(ε^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $\mathcal{\widetilde O}(ε^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications.