LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images
This provides a tool for researchers and practitioners to uncover model biases and vulnerabilities in computer vision, though it is incremental as it builds on existing language and image editing methods.
The authors tackled the problem of evaluating visual models by generating language-guided counterfactual images to stress-test them, resulting in significant and consistent performance drops across diverse pre-trained models.
We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language modeling and text-based image editing to augment an IID test set with a suite of diverse, realistic, and challenging test images without altering model weights. We benchmark the performance of a diverse set of pre-trained models on our generated data and observe significant and consistent performance drops. We further analyze model sensitivity across different types of edits, and demonstrate its applicability at surfacing previously unknown class-level model biases in ImageNet. Code is available at https://github.com/virajprabhu/lance.