On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation
This addresses the problem of computational cost in Bayesian uncertainty estimation for deep learning, offering incremental improvements for practitioners needing efficient uncertainty quantification.
The paper tackles uncertainty quantification in deep learning by proposing a family of algorithms that decouple representation learning from uncertainty estimation, comparing methods like ensembles and Monte Carlo Dropout. The results show these simple methods strongly outperform vanilla SGD on benchmarks like ImageNet, with limited value in adding multiple uncertainty layers.
Uncertainty quantification for deep learning is a challenging open problem. Bayesian statistics offer a mathematically grounded framework to reason about uncertainties; however, approximate posteriors for modern neural networks still require prohibitive computational costs. We propose a family of algorithms which split the classification task into two stages: representation learning and uncertainty estimation. We compare four specific instances, where uncertainty estimation is performed via either an ensemble of Stochastic Gradient Descent or Stochastic Gradient Langevin Dynamics snapshots, an ensemble of bootstrapped logistic regressions, or via a number of Monte Carlo Dropout passes. We evaluate their performance in terms of \emph{selective} classification (risk-coverage), and their ability to detect out-of-distribution samples. Our experiments suggest there is limited value in adding multiple uncertainty layers to deep classifiers, and we observe that these simple methods strongly outperform a vanilla point-estimate SGD in some complex benchmarks like ImageNet.