CV AI LGJun 13, 2024

Large-Scale Evaluation of Open-Set Image Classification Techniques

Halil Bisgin, Andres Palechor, Mike Suter, Manuel Günther

arXiv:2406.09112v15.23 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of assessing open-set classification algorithms for real-world applications, though it is incremental as it focuses on comparative evaluation rather than introducing new methods.

The paper conducted a large-scale evaluation of open-set image classification techniques on three protocols mimicking real-world challenges, finding that EOS improved most post-processing algorithms and hybrid models like OpenMax and PROSER performed well on negative test samples but poorly on unseen unknown classes.

The goal for classification is to correctly assign labels to unseen samples. However, most methods misclassify samples with unseen labels and assign them to one of the known classes. Open-Set Classification (OSC) algorithms aim to maximize both closed and open-set recognition capabilities. Recent studies showed the utility of such algorithms on small-scale data sets, but limited experimentation makes it difficult to assess their performances in real-world problems. Here, we provide a comprehensive comparison of various OSC algorithms, including training-based (SoftMax, Garbage, EOS) and post-processing methods (Maximum SoftMax Scores, Maximum Logit Scores, OpenMax, EVM, PROSER), the latter are applied on features from the former. We perform our evaluation on three large-scale protocols that mimic real-world challenges, where we train on known and negative open-set samples, and test on known and unknown instances. Our results show that EOS helps to improve performance of almost all post-processing algorithms. Particularly, OpenMax and PROSER are able to exploit better-trained networks, demonstrating the utility of hybrid models. However, while most algorithms work well on negative test samples -- samples of open-set classes seen during training -- they tend to perform poorly when tested on samples of previously unseen unknown classes, especially in challenging conditions.

View on arXiv PDF Code

Similar