Facility Locations Utility for Uncovering Classifier Overconfidence
This work addresses the risk of model overconfidence for users relying on black-box classifiers in unlabeled scenarios, representing an incremental improvement over prior approaches focused on misclassifications.
The paper tackles the problem of identifying overconfident misclassifications in black-box classifiers when labeled test data is unavailable, proposing a facility locations utility model and greedy query algorithm that outperforms previous methods in discovering such errors.
Assessing the predictive accuracy of black box classifiers is challenging in the absence of labeled test datasets. In these scenarios we may need to rely on a human oracle to evaluate individual predictions; presenting the challenge to create query algorithms to guide the search for points that provide the most information about the classifier's predictive characteristics. Previous works have focused on developing utility models and query algorithms for discovering unknown unknowns --- misclassifications with a predictive confidence above some arbitrary threshold. However, if misclassifications occur at the rate reflected by the confidence values, then these search methods reveal nothing more than a proper assessment of predictive certainty. We are unable to properly mitigate the risks associated with model deficiency when the model's confidence in prediction exceeds the actual model accuracy. We propose a facility locations utility model and corresponding greedy query algorithm that instead searches for overconfident unknown unknowns. Through robust empirical experiments we demonstrate that the greedy query algorithm with the facility locations utility model consistently results in oracle queries with superior performance in discovering overconfident unknown unknowns than previous methods.