LG AIJul 11, 2025

Last Layer Hamiltonian Monte Carlo

Koen Vellenga, H. Joe Steinhauer, Göran Falkman, Jonas Andersson, Anders Sjögren

arXiv:2507.08905v17.11 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses the problem of scalable uncertainty estimation for deep learning practitioners, offering an incremental improvement by adapting an existing method to reduce computational costs.

The paper tackles the computational challenge of applying Hamiltonian Monte Carlo (HMC) for uncertainty estimation in deep neural networks by restricting sampling to the last layer, making it feasible for data-intensive scenarios. The results show that this last layer HMC approach achieves competitive in-distribution classification and out-of-distribution detection performance on real-world video datasets, with additional sampled parameters improving OOD detection but not classification.

We explore the use of Hamiltonian Monte Carlo (HMC) sampling as a probabilistic last layer approach for deep neural networks (DNNs). While HMC is widely regarded as a gold standard for uncertainty estimation, the computational demands limit its application to large-scale datasets and large DNN architectures. Although the predictions from the sampled DNN parameters can be parallelized, the computational cost still scales linearly with the number of samples (similar to an ensemble). Last layer HMC (LL--HMC) reduces the required computations by restricting the HMC sampling to the final layer of a DNN, making it applicable to more data-intensive scenarios with limited computational resources. In this paper, we compare LL-HMC against five last layer probabilistic deep learning (LL-PDL) methods across three real-world video datasets for driver action and intention. We evaluate the in-distribution classification performance, calibration, and out-of-distribution (OOD) detection. Due to the stochastic nature of the probabilistic evaluations, we performed five grid searches for different random seeds to avoid being reliant on a single initialization for the hyperparameter configurations. The results show that LL--HMC achieves competitive in-distribution classification and OOD detection performance. Additional sampled last layer parameters do not improve the classification performance, but can improve the OOD detection. Multiple chains or starting positions did not yield consistent improvements.

View on arXiv PDF

Similar