LG OCMar 5, 2025

On the Convergence of Adam-Type Algorithm for Bilevel Optimization under Unbounded Smoothness

arXiv:2503.03908v17.11 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the challenge of applying Adam to bilevel optimization, which is important for applications like meta-learning, but it is an incremental extension of existing methods.

The paper tackled the extension of Adam to bilevel optimization problems, achieving an oracle complexity of O(ε^{-4}) for finding ε-stationary points under unbounded smoothness conditions.

Adam has become one of the most popular optimizers for training modern deep neural networks, such as transformers. However, its applicability is largely restricted to single-level optimization problems. In this paper, we aim to extend vanilla Adam to tackle bilevel optimization problems, which have important applications in machine learning, such as meta-learning. In particular, we study stochastic bilevel optimization problems where the lower-level function is strongly convex and the upper-level objective is nonconvex with potentially unbounded smoothness. This unbounded smooth objective function covers a broad class of neural networks, including transformers, which may exhibit non-Lipschitz gradients. In this work, we introduce AdamBO, a single-loop Adam-type method that achieves $\widetilde{O}(ε^{-4})$ oracle complexity to find $ε$-stationary points, where the oracle calls involve stochastic gradient or Hessian/Jacobian-vector product evaluations. The key to our analysis is a novel randomness decoupling lemma that provides refined control over the lower-level variable. We conduct extensive experiments on various machine learning tasks involving bilevel formulations with recurrent neural networks (RNNs) and transformers, demonstrating the effectiveness of our proposed Adam-type algorithm.

View on arXiv PDF

Similar