LGMLOct 18, 2024

Debiasing Mini-Batch Quadratics for Applications in Deep Learning

arXiv:2410.14325v22 citationsh-index: 3ICLR
Originality Incremental advance
AI Analysis

This addresses a specific technical problem in deep learning for researchers and practitioners using second-order methods, but it is incremental as it focuses on correcting biases in existing approximations.

The paper tackles the bias in mini-batch quadratic approximations used in deep learning, which distorts applications like second-order optimization and uncertainty quantification, and develops debiasing strategies to address this systematic error.

Quadratic approximations form a fundamental building block of machine learning methods. E.g., second-order optimizers try to find the Newton step into the minimum of a local quadratic proxy to the objective function; and the second-order approximation of a network's loss function can be used to quantify the uncertainty of its outputs via the Laplace approximation. When computations on the entire training set are intractable - typical for deep learning - the relevant quantities are computed on mini-batches. This, however, distorts and biases the shape of the associated stochastic quadratic approximations in an intricate way with detrimental effects on applications. In this paper, we (i) show that this bias introduces a systematic error, (ii) provide a theoretical explanation for it, (iii) explain its relevance for second-order optimization and uncertainty quantification via the Laplace approximation in deep learning, and (iv) develop and evaluate debiasing strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes