LGMLJul 21, 2022

JAWS: Auditing Predictive Uncertainty Under Covariate Shift

arXiv:2207.10716v217 citationsh-index: 44
Originality Incremental advance
AI Analysis

This addresses the challenge of reliable uncertainty estimation in machine learning models when data distributions shift, which is crucial for applications like healthcare or autonomous systems, though it appears incremental as it builds on existing jackknife+ methods.

The authors tackled the problem of predictive uncertainty quantification under covariate shift by proposing JAWS, a series of wrapper methods including JAW and JAWA, which theoretically achieve finite-sample coverage guarantees and outperform state-of-the-art baselines in real-world biased datasets for interval-generation and error-assessment tasks.

We propose \textbf{JAWS}, a series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, centered on the core method \textbf{JAW}, the \textbf{JA}ckknife+ \textbf{W}eighted with data-dependent likelihood-ratio weights. JAWS also includes computationally efficient \textbf{A}pproximations of JAW using higher-order influence functions: \textbf{JAWA}. Theoretically, we show that JAW relaxes the jackknife+'s assumption of data exchangeability to achieve the same finite-sample coverage guarantee even under covariate shift. JAWA further approaches the JAW guarantee in the limit of the sample size or the influence function order under common regularity assumptions. Moreover, we propose a general approach to repurposing predictive interval-generating methods and their guarantees to the reverse task: estimating the probability that a prediction is erroneous, based on user-specified error criteria such as a safe or acceptable tolerance threshold around the true label. We then propose \textbf{JAW-E} and \textbf{JAWA-E} as the repurposed proposed methods for this \textbf{E}rror assessment task. Practically, JAWS outperform state-of-the-art predictive inference baselines in a variety of biased real world data sets for interval-generation and error-assessment predictive uncertainty auditing tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes