Learning to Augment Distributions for Out-of-Distribution Detection
This work addresses the challenge of reliable OOD detection for machine learning systems in open-world scenarios, representing an incremental improvement over existing methods.
The paper tackles the problem of out-of-distribution (OOD) detection in open-world classification by addressing the distribution discrepancy between auxiliary and unseen OOD data, proposing Distributional-Augmented OOD Learning (DAL) which improves detection performance across various setups.
Open-world classification systems should discern out-of-distribution (OOD) data whose labels deviate from those of in-distribution (ID) cases, motivating recent studies in OOD detection. Advanced works, despite their promising progress, may still fail in the open world, owing to the lack of knowledge about unseen OOD data in advance. Although one can access auxiliary OOD data (distinct from unseen ones) for model training, it remains to analyze how such auxiliary data will work in the open world. To this end, we delve into such a problem from a learning theory perspective, finding that the distribution discrepancy between the auxiliary and the unseen real OOD data is the key to affecting the open-world detection performance. Accordingly, we propose Distributional-Augmented OOD Learning (DAL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution. We justify that the predictor trained over the worst OOD data in the ball can shrink the OOD distribution discrepancy, thus improving the open-world detection performance given only the auxiliary OOD data. We conduct extensive evaluations across representative OOD detection setups, demonstrating the superiority of our DAL over its advanced counterparts.