LG MLOct 18, 2024

Debiasing Mini-Batch Quadratics for Applications in Deep Learning

Lukas Tatzel, Bálint Mucsányi, Osane Hackel, Philipp Hennig

arXiv:2410.14325v26.42 citationsh-index: 30ICLR

Originality Incremental advance

AI Analysis

This addresses a specific technical problem in deep learning for researchers and practitioners using second-order methods, but it is incremental as it focuses on correcting biases in existing approximations.

The paper tackles the bias in mini-batch quadratic approximations used in deep learning, which distorts applications like second-order optimization and uncertainty quantification, and develops debiasing strategies to address this systematic error.

Quadratic approximations form a fundamental building block of machine learning methods. E.g., second-order optimizers try to find the Newton step into the minimum of a local quadratic proxy to the objective function; and the second-order approximation of a network's loss function can be used to quantify the uncertainty of its outputs via the Laplace approximation. When computations on the entire training set are intractable - typical for deep learning - the relevant quantities are computed on mini-batches. This, however, distorts and biases the shape of the associated stochastic quadratic approximations in an intricate way with detrimental effects on applications. In this paper, we (i) show that this bias introduces a systematic error, (ii) provide a theoretical explanation for it, (iii) explain its relevance for second-order optimization and uncertainty quantification via the Laplace approximation in deep learning, and (iv) develop and evaluate debiasing strategies.

View on arXiv PDF

Similar