Evaluating Out-of-Distribution Detectors Through Adversarial Generation of Outliers
This addresses the need for more realistic robustness assessment in OOD detection, which is crucial for safety-critical applications like autonomous systems, though it is incremental as it builds on existing evaluation methods.
The paper tackles the problem of evaluating out-of-distribution (OOD) detectors by proposing EvG, a new protocol that uses generative models and MCMC sampling to create realistic outliers, revealing weaknesses in state-of-the-art detectors.
A reliable evaluation method is essential for building a robust out-of-distribution (OOD) detector. Current robustness evaluation protocols for OOD detectors rely on injecting perturbations to outlier data. However, the perturbations are unlikely to occur naturally or not relevant to the content of data, providing a limited assessment of robustness. In this paper, we propose Evaluation-via-Generation for OOD detectors (EvG), a new protocol for investigating the robustness of OOD detectors under more realistic modes of variation in outliers. EvG utilizes a generative model to synthesize plausible outliers, and employs MCMC sampling to find outliers misclassified as in-distribution with the highest confidence by a detector. We perform a comprehensive benchmark comparison of the performance of state-of-the-art OOD detectors using EvG, uncovering previously overlooked weaknesses.