LGMay 5, 2024

A View on Out-of-Distribution Identification from a Statistical Testing Theory Perspective

Alberto Caron, Chris Hicks, Vasilios Mavroudis

arXiv:2405.03052v34.62 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses the critical issue of distribution shifts in real-world ML deployments, but appears incremental as it builds on existing statistical frameworks.

The paper tackles the problem of detecting out-of-distribution (OOD) samples in machine learning by reformulating it from a statistical testing perspective, providing convergence guarantees for a test based on the Wasserstein distance and a simple empirical evaluation.

We study the problem of efficiently detecting Out-of-Distribution (OOD) samples at test time in supervised and unsupervised learning contexts. While ML models are typically trained under the assumption that training and test data stem from the same distribution, this is often not the case in realistic settings, thus reliably detecting distribution shifts is crucial at deployment. We re-formulate the OOD problem under the lenses of statistical testing and then discuss conditions that render the OOD problem identifiable in statistical terms. Building on this framework, we study convergence guarantees of an OOD test based on the Wasserstein distance, and provide a simple empirical evaluation.

View on arXiv PDF Code

Similar