CVJun 14, 2024

Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox

Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen

arXiv:2406.09867v36.54 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a fundamental evaluation issue in OOD detection for machine learning practitioners, offering a more nuanced benchmark to better assess method performance, though it is incremental in refining existing evaluation frameworks.

The paper tackles the problem of evaluating out-of-distribution (OOD) detection by highlighting that existing benchmarks treat novel labels as OOD, ignoring marginal samples with close semantic content, which creates a Sorites Paradox. They introduce the Incremental Shift OOD (IS-OOD) benchmark, dividing test samples by semantic and covariate shift degrees using a Language Aligned Image feature Decomposition (LAID) method, and find that most OOD detection methods improve with increased semantic shift, while some like GradNorm rely less on it.

Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data. However, some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox. In this paper, we construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue, in which we divide the test samples into subsets with different semantic and covariate shift degrees relative to the ID dataset. The data division is achieved through a shift measuring method based on our proposed Language Aligned Image feature Decomposition (LAID). Moreover, we construct a Synthetic Incremental Shift (Syn-IS) dataset that contains high-quality generated images with more diverse covariate contents to complement the IS-OOD benchmark. We evaluate current OOD detection methods on our benchmark and find several important insights: (1) The performance of most OOD detection methods significantly improves as the semantic shift increases; (2) Some methods like GradNorm may have different OOD detection mechanisms as they rely less on semantic shifts to make decisions; (3) Excessive covariate shifts in the image are also likely to be considered as OOD for some methods. Our code and data are released in https://github.com/qqwsad5/IS-OOD.

View on arXiv PDF Code

Similar