CVDec 6, 2022

Adaptive Testing of Computer Vision Models

Irena Gao, Gabriel Ilharco, Scott Lundberg, Marco Tulio Ribeiro

MicrosoftUW

arXiv:2212.02774v221.751 citationsh-index: 29Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of testing computer vision models for systematic errors, which is crucial for improving robustness in real-world applications, though it is incremental as it builds on existing tools like CLIP and GPT-3.

The paper tackles the problem of identifying systematic failure modes in vision models by introducing AdaVision, an interactive process that helps users find and fix coherent error groups, resulting in failure rates 2-3 times higher than automatic methods and enabling bug fixes without degrading in-distribution accuracy.

Vision models often fail systematically on groups of data that share common semantic characteristics (e.g., rare objects or unusual scenes), but identifying these failure modes is a challenge. We introduce AdaVision, an interactive process for testing vision models which helps users identify and fix coherent failure modes. Given a natural language description of a coherent group, AdaVision retrieves relevant images from LAION-5B with CLIP. The user then labels a small amount of data for model correctness, which is used in successive retrieval rounds to hill-climb towards high-error regions, refining the group definition. Once a group is saturated, AdaVision uses GPT-3 to suggest new group descriptions for the user to explore. We demonstrate the usefulness and generality of AdaVision in user studies, where users find major bugs in state-of-the-art classification, object detection, and image captioning models. These user-discovered groups have failure rates 2-3x higher than those surfaced by automatic error clustering methods. Finally, finetuning on examples found with AdaVision fixes the discovered bugs when evaluated on unseen examples, without degrading in-distribution accuracy, and while also improving performance on out-of-distribution datasets.

View on arXiv PDF Code

Similar