MLLGJun 29, 2020

On the Iteration Complexity of Hypergradient Computation

arXiv:2006.16218v2244 citations
AI Analysis

This work addresses a key bottleneck in machine learning for researchers and practitioners dealing with bilevel optimization, offering incremental improvements in computational efficiency through rigorous comparison of existing methods.

The paper tackles the problem of efficiently computing hypergradients in bilevel optimization problems, such as hyperparameter tuning and meta-learning, by providing a unified theoretical analysis that quantitatively compares popular approximation methods and identifies approximate implicit differentiation with conjugate gradient as the most computationally efficient, supported by experimental validation.

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes