LGJun 4, 2024

Measuring Stochastic Data Complexity with Boltzmann Influence Functions

arXiv:2406.02745v21 citations
AI Analysis

This work addresses uncertainty estimation for reliable and calibrated predictions in machine learning, particularly under distribution shifts, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackled the problem of estimating model prediction uncertainty under distribution shifts by proposing IF-COMP, a scalable approximation of the predictive normalized maximum likelihood distribution, which improved uncertainty calibration, mislabel detection, and out-of-distribution detection, consistently matching or beating strong baselines.

Estimating the uncertainty of a model's prediction on a test point is a crucial part of ensuring reliability and calibration under distribution shifts. A minimum description length approach to this problem uses the predictive normalized maximum likelihood (pNML) distribution, which considers every possible label for a data point, and decreases confidence in a prediction if other labels are also consistent with the model and training data. In this work we propose IF-COMP, a scalable and efficient approximation of the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function. IF-COMP can be used to produce well-calibrated predictions on test points as well as measure complexity in both labelled and unlabelled settings. We experimentally validate IF-COMP on uncertainty calibration, mislabel detection, and OOD detection tasks, where it consistently matches or beats strong baseline methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes