CVDec 6, 2022

Adaptive Testing of Computer Vision Models

MicrosoftUW
arXiv:2212.02774v251 citationsh-index: 29
AI Analysis

This addresses the challenge of testing computer vision models for systematic errors, which is crucial for improving robustness in real-world applications, though it is incremental as it builds on existing tools like CLIP and GPT-3.

The paper tackles the problem of identifying systematic failure modes in vision models by introducing AdaVision, an interactive process that helps users find and fix coherent error groups, resulting in failure rates 2-3 times higher than automatic methods and enabling bug fixes without degrading in-distribution accuracy.

Vision models often fail systematically on groups of data that share common semantic characteristics (e.g., rare objects or unusual scenes), but identifying these failure modes is a challenge. We introduce AdaVision, an interactive process for testing vision models which helps users identify and fix coherent failure modes. Given a natural language description of a coherent group, AdaVision retrieves relevant images from LAION-5B with CLIP. The user then labels a small amount of data for model correctness, which is used in successive retrieval rounds to hill-climb towards high-error regions, refining the group definition. Once a group is saturated, AdaVision uses GPT-3 to suggest new group descriptions for the user to explore. We demonstrate the usefulness and generality of AdaVision in user studies, where users find major bugs in state-of-the-art classification, object detection, and image captioning models. These user-discovered groups have failure rates 2-3x higher than those surfaced by automatic error clustering methods. Finally, finetuning on examples found with AdaVision fixes the discovered bugs when evaluated on unseen examples, without degrading in-distribution accuracy, and while also improving performance on out-of-distribution datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes