LGMay 5, 2024

A View on Out-of-Distribution Identification from a Statistical Testing Theory Perspective

arXiv:2405.03052v32 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses the critical issue of distribution shifts in real-world ML deployments, but appears incremental as it builds on existing statistical frameworks.

The paper tackles the problem of detecting out-of-distribution (OOD) samples in machine learning by reformulating it from a statistical testing perspective, providing convergence guarantees for a test based on the Wasserstein distance and a simple empirical evaluation.

We study the problem of efficiently detecting Out-of-Distribution (OOD) samples at test time in supervised and unsupervised learning contexts. While ML models are typically trained under the assumption that training and test data stem from the same distribution, this is often not the case in realistic settings, thus reliably detecting distribution shifts is crucial at deployment. We re-formulate the OOD problem under the lenses of statistical testing and then discuss conditions that render the OOD problem identifiable in statistical terms. Building on this framework, we study convergence guarantees of an OOD test based on the Wasserstein distance, and provide a simple empirical evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes