On the Stability and Generalization of First-order Bilevel Minimax Optimization
This provides foundational theoretical insights for machine learning tasks like hyperparameter optimization and reinforcement learning that use bilevel minimax optimization.
The authors tackled the theoretical gap in understanding generalization for first-order gradient-based bilevel minimax solvers, deriving fine-grained generalization bounds for three algorithms and revealing trade-offs among stability, generalization gaps, and practical settings.
Bilevel optimization and bilevel minimax optimization have recently emerged as unifying frameworks for a range of machine-learning tasks, including hyperparameter optimization and reinforcement learning. The existing literature focuses on empirical efficiency and convergence guarantees, leaving a critical theoretical gap in understanding how well these algorithms generalize. To bridge this gap, we provide the first systematic generalization analysis for first-order gradient-based bilevel minimax solvers with lower-level minimax problems. Specifically, by leveraging algorithmic stability arguments, we derive fine-grained generalization bounds for three representative algorithms, including single-timescale stochastic gradient descent-ascent, and two variants of two-timescale stochastic gradient descent-ascent. Our results reveal a precise trade-off among algorithmic stability, generalization gaps, and practical settings. Furthermore, extensive empirical evaluations corroborate our theoretical insights on realistic optimization tasks with bilevel minimax structures.