Generalized Adversarial Distances to Efficiently Discover Classifier Errors
This work addresses the need for efficient evaluation of black-box models in application domains, focusing on detecting costly high-confidence errors, though it appears incremental as it generalizes an existing method.
The paper tackles the problem of efficiently discovering high-confidence errors in black-box classifiers by proposing a generalization of Adversarial Distance search, which leverages adversarial machine learning concepts to identify overly confident predictions prone to errors. Experimental results show that the method finds errors at rates greater than expected based on confidence and outperforms competing methods.
Given a black-box classification model and an unlabeled evaluation dataset from some application domain, efficient strategies need to be developed to evaluate the model. Random sampling allows a user to estimate metrics like accuracy, precision, and recall, but may not provide insight to high-confidence errors. High-confidence errors are rare events for which the model is highly confident in its prediction, but is wrong. Such errors can represent costly mistakes and should be explicitly searched for. In this paper we propose a generalization to the Adversarial Distance search that leverages concepts from adversarial machine learning to identify predictions for which a classifier may be overly confident. These predictions are useful instances to sample when looking for high-confidence errors because they are prone to a higher rate of error than expected. Our generalization allows Adversarial Distance to be applied to any classifier or data domain. Experimental results show that the generalized method finds errors at rates greater than expected given the confidence of the sampled predictions, and outperforms competing methods.