MLLGAug 13, 2020

A statistical theory of cold posteriors in deep neural networks

arXiv:2008.05912v282 citations
Originality Incremental advance
AI Analysis

This addresses a foundational issue in Bayesian deep learning for image classification, offering a theoretical explanation for a widely observed but concerning practice.

The paper tackles the problem of why Bayesian neural networks require artificially reduced uncertainty (cold posteriors) to match standard neural networks' performance, arguing that this is due to using the wrong likelihood for curated datasets like CIFAR-10. They develop a generative model for curation that provides a principled Bayesian explanation, showing its likelihood aligns with tempered likelihoods from prior work.

To get Bayesian neural networks to perform comparably to standard neural networks it is usually necessary to artificially reduce uncertainty using a "tempered" or "cold" posterior. This is extremely concerning: if the prior is accurate, Bayes inference/decision theory is optimal, and any artificial changes to the posterior should harm performance. While this suggests that the prior may be at fault, here we argue that in fact, BNNs for image classification use the wrong likelihood. In particular, standard image benchmark datasets such as CIFAR-10 are carefully curated. We develop a generative model describing curation which gives a principled Bayesian account of cold posteriors, because the likelihood under this new generative model closely matches the tempered likelihoods used in past work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes