LGOCJun 8, 2021

Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

arXiv:2106.04188v246 citations
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap for researchers in hyperparameter optimization, though it is incremental as it builds on existing bilevel programming frameworks.

This paper tackles the lack of generalization analysis for gradient-based bilevel programming in hyperparameter optimization by deriving expectation bounds based on uniform stability, explaining phenomena like overfitting to the validation set and showing that gradient-based methods can outperform cross-validation under certain conditions, with experimental validation on feature learning and noisy label tasks.

The (gradient-based) bilevel programming framework is widely used in hyperparameter optimization and has achieved excellent performance empirically. Previous theoretical work mainly focuses on its optimization properties, while leaving the analysis on generalization largely open. This paper attempts to address the issue by presenting an expectation bound w.r.t. the validation set based on uniform stability. Our results can explain some mysterious behaviours of the bilevel programming in practice, for instance, overfitting to the validation set. We also present an expectation bound for the classical cross-validation algorithm. Our results suggest that gradient-based algorithms can be better than cross-validation under certain conditions in a theoretical perspective. Furthermore, we prove that regularization terms in both the outer and inner levels can relieve the overfitting problem in gradient-based algorithms. In experiments on feature learning and data reweighting for noisy labels, we corroborate our theoretical findings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes