CVAILGJul 1, 2023

ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

arXiv:2307.00398v38 citationsh-index: 50Has Code
Originality Incremental advance
AI Analysis

This addresses the issue of embedding uncertainty for users of vision-language models, offering a post-hoc solution that is incremental but practical for real-world applications like active learning and model selection.

The paper tackles the problem of deterministic embeddings in vision-language models not capturing inherent ambiguity by proposing ProbVLM, a probabilistic adapter that estimates embedding distributions without large-scale data or computing, showing improved performance on retrieval tasks across four datasets.

Large-scale vision-language models (VLMs) like CLIP successfully find correspondences between images and text. Through the standard deterministic mapping process, an image or a text sample is mapped to a single vector in the embedding space. This is problematic: as multiple samples (images or text) can abstract the same concept in the physical world, deterministic embeddings do not reflect the inherent ambiguity in the embedding space. We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. On four challenging datasets, i.e., COCO, Flickr, CUB, and Oxford-flowers, we estimate the multi-modal embedding uncertainties for two VLMs, i.e., CLIP and BLIP, quantify the calibration of embedding uncertainties in retrieval tasks and show that ProbVLM outperforms other methods. Furthermore, we propose active learning and model selection as two real-world downstream tasks for VLMs and show that the estimated uncertainty aids both tasks. Lastly, we present a novel technique for visualizing the embedding distributions using a large-scale pre-trained latent diffusion model. Code is available at https://github.com/ExplainableML/ProbVLM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes