CVOct 13, 2022

Caption supervision enables robust learners

arXiv:2210.07396v24 citationsh-index: 31Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of improving distributional robustness in computer vision models for researchers and practitioners, offering incremental advancements by refining supervision strategies and data handling.

The paper demonstrates that caption-supervised CNNs, trained using image labels derived from captions, can achieve greater distributional robustness than vision-language models on the same data, as shown in a controlled comparison study. It also introduces CaptionNet, a dataset with over 50,000 human-labeled samples and web-scraped captions, to facilitate high-accuracy experiments in robust computer vision.

Vision language (VL) models like CLIP are robust to natural distribution shifts, in part because CLIP learns on unstructured data using a technique called caption supervision; the model inteprets image-linked texts as ground-truth labels. In a carefully controlled comparison study, we show that caption-supervised CNNs trained on a standard cross-entropy loss (with image labels assigned by scanning captions for class names) can exhibit greater distributional robustness than VL models trained on the same data. To facilitate future experiments with high-accuracy caption-supervised models, we introduce CaptionNet (https://github.com/penfever/CaptionNet/), which includes a class-balanced, fully supervised dataset with over 50,000 new human-labeled ImageNet-compliant samples which includes web-scraped captions. In a series of experiments on CaptionNet, we show how the choice of loss function, data filtration and supervision strategy enable robust computer vision. We also provide the codebase necessary to reproduce our experiments at VL Hub (https://github.com/penfever/vlhub/).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes