MLLGNov 13, 2020

Convergence Properties of Stochastic Hypergradients

arXiv:2011.07122v329 citations
AI Analysis

This work addresses hyperparameter optimization and meta-learning in machine learning, offering an incremental improvement by extending existing methods to stochastic settings for large-scale applications.

The paper tackles the problem of efficiently computing hypergradients in bilevel optimization for large datasets by proposing a stochastic variant of approximate implicit differentiation. It provides mean square error bounds for the hypergradient approximation under specific assumptions and demonstrates practical advantages through numerical experiments.

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. The method that we propose is a stochastic variant of the approximate implicit differentiation approach in (Pedregosa, 2016). We provide bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. In particular, our main bound is agnostic to the choice of the two stochastic solvers employed by the procedure. We provide numerical experiments to support our theoretical analysis and to show the advantage of using stochastic hypergradients in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes