ML LGMay 9

Optimality of Sub-network Laplace Approximations: New Results and Methods

Swarnali Raha, Kshitij Khare, Rohit K Patra

arXiv:2605.0907519.8

AI Analysis

For practitioners needing uncertainty quantification in deep neural networks, this work provides theoretically grounded and computationally efficient approximations to the full Laplace posterior.

The paper proves that sub-network Laplace approximations systematically underestimate predictive variance and proposes two principled selection methods (Gradient-Laplace and Greedy-Laplace) that outperform existing heuristics, with theoretical guarantees and strong empirical results.

Although the Laplace approximation offers a simple route to uncertainty quantification in deep neural networks, its reliance on inverting large Hessian matrices has motivated a range of computationally feasible low-dimensional or sparse approximations. A prominent class of such methods - sub-network Laplace approximations, constructs surrogates by restricting attention to a small subset of parameters. Existing approaches in this family typically rely on diagonal, layer-wise, or other architectural heuristics for subset selection, which ignore cross-parameter interactions and lack formal optimality guarantees. In this paper, we provide a rigorous theoretical analysis of the sub-network Laplace paradigm. We prove that all sub-network Laplace methods systematically underestimate the predictive variance of the full Laplace posterior, and that this bias decreases monotonically as the retained sub-matrix expands. Leveraging this insight, we propose two principled, analytically grounded sub-network Hessian approximations: \textit{Gradient-Laplace} selects parameters with the largest average squared gradients of the model output with respect to the parameters over a reference dataset; while \textit{Greedy-Laplace} iteratively refines this selection by accounting for off-diagonal interactions in the precision matrix. We establish theoretical guarantees characterizing their optimality properties and show that Gradient-Laplace provably outperforms existing heuristic approaches. Extensive numerical studies across diverse settings indicate that these methods perform strongly relative to existing benchmarks.

View on arXiv PDF

Similar