MLLGFeb 10, 2020

Super-efficiency of automatic differentiation for functions defined as a minimum

arXiv:2002.03722v142 citations
AI Analysis

This work addresses a computational bottleneck in optimization for researchers and practitioners, providing theoretical insights and practical guidelines for gradient estimation, though it is incremental in refining existing methods.

The paper tackles the problem of estimating gradients for functions defined as a minimum in min-min or max-min optimization, where no closed-form solution exists. It finds that automatic differentiation through iterative algorithms yields an error close to the square of that from analytic approximations, demonstrating super-efficiency, with analysis and experiments on gradient descent, stochastic gradient descent, and Wasserstein barycenter computation.

In min-min optimization or max-min optimization, one has to compute the gradient of a function defined as a minimum. In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm. There are two usual ways of estimating the gradient of the function: using either an analytic formula obtained by assuming exactness of the approximation, or automatic differentiation through the algorithm. In this paper, we study the asymptotic error made by these estimators as a function of the optimization error. We find that the error of the automatic estimator is close to the square of the error of the analytic estimator, reflecting a super-efficiency phenomenon. The convergence of the automatic estimator greatly depends on the convergence of the Jacobian of the algorithm. We analyze it for gradient descent and stochastic gradient descent and derive convergence rates for the estimators in these cases. Our analysis is backed by numerical experiments on toy problems and on Wasserstein barycenter computation. Finally, we discuss the computational complexity of these estimators and give practical guidelines to chose between them.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes