Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models
This work addresses the need for more efficient benchmarking of computer vision models to uncover real-world failures, though it is incremental in leveraging existing generative models.
The paper tackles the problem of efficiently identifying failure conditions for image classifiers by using text-to-image models to generate synthetic images, proposing an iterative method that combines Bayesian optimization to reduce generation costs and improve detection of poor classifier behavior.
Image classifiers should be used with caution in the real world. Performance evaluated on a validation set may not reflect performance in the real world. In particular, classifiers may perform well for conditions that are frequently encountered during training, but poorly for other infrequent conditions. In this study, we hypothesize that recent advances in text-to-image generative models make them valuable for benchmarking computer vision models such as image classifiers: they can generate images conditioned by textual prompts that cause classifier failures, allowing failure conditions to be described with textual attributes. However, their generation cost becomes an issue when a large number of synthetic images need to be generated, which is the case when many different attribute combinations need to be tested. We propose an image classifier benchmarking method as an iterative process that alternates image generation, classifier evaluation, and attribute selection. This method efficiently explores the attributes that ultimately lead to poor behavior detection.