LGJul 27, 2022

Towards Clear Expectations for Uncertainty Estimation

arXiv:2207.13341v12 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This addresses the issue of unclear expectations in UQ evaluation for ML practitioners, but it is incremental as it builds on existing UQ methods and benchmarks.

The paper tackles the problem of inconsistent evaluation protocols in Uncertainty Quantification (UQ) by proposing a new perspective based on five downstream tasks to clarify requirements, and finds that state-of-the-art intrinsic UQ methods do not statistically outperform simple baselines on a benchmark of 7 classification datasets.

If Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML), most UQ methods suffer from disparate and inconsistent evaluation protocols. We claim this inconsistency results from the unclear requirements the community expects from UQ. This opinion paper offers a new perspective by specifying those requirements through five downstream tasks where we expect uncertainty scores to have substantial predictive power. We design these downstream tasks carefully to reflect real-life usage of ML models. On an example benchmark of 7 classification datasets, we did not observe statistical superiority of state-of-the-art intrinsic UQ methods against simple baselines. We believe that our findings question the very rationale of why we quantify uncertainty and call for a standardized protocol for UQ evaluation based on metrics proven to be relevant for the ML practitioner.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes