CVMay 3, 2023

Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

arXiv:2305.02297v12 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of few-shot learning for visual language models, which is incremental as it builds on existing adaptation methods.

The paper tackles the problem of adapting pre-trained visual language models to new tasks with limited labeled data, showing that a self-labeling approach using unlabeled images yields significant performance gains across multiple visual language tasks.

Large-scale visual language models are widely used as pre-trained models and then adapted for various downstream tasks. While humans are known to efficiently learn new tasks from a few examples, deep learning models struggle with adaptation from few examples. In this work, we look into task adaptation in the low-data regime, and provide a thorough study of the existing adaptation methods for generative Visual Language Models. And we show important benefits of self-labelling, i.e. using the model's own predictions to self-improve when having access to a larger number of unlabelled images of the same distribution. Our study demonstrates significant gains using our proposed task adaptation pipeline across a wide range of visual language tasks such as visual classification (ImageNet), visual captioning (COCO), detailed visual captioning (Localised Narratives) and visual question answering (VQAv2).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes