CVNov 27, 2025

Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models

arXiv:2511.22019v2Has Code
AI Analysis

This addresses reliability issues in safety-critical applications of vision-language models, though it is an incremental improvement as a post-hoc method.

The paper tackles the problem of vision-language models assigning high confidence to misclassifications by introducing a training-free uncertainty estimation method that measures visual feature consistency within classes using probabilistic embeddings. The method achieves state-of-the-art error detection performance across multiple datasets, working effectively with as few as 10 training images per class.

Vision-language models (VLMs), such as CLIP, have gained popularity for their strong open vocabulary classification performance, but they are prone to assigning high confidence scores to misclassifications, limiting their reliability in safety-critical applications. We introduce a training-free, post-hoc uncertainty estimation method for contrastive VLMs that can be used to detect erroneous predictions. The key to our approach is to measure visual feature consistency within a class, using feature projection combined with multivariate Gaussians to create class-specific probabilistic embeddings. Our method is VLM-agnostic, requires no fine-tuning, demonstrates robustness to distribution shift, and works effectively with as few as 10 training images per class. Extensive experiments on ImageNet, Flowers102, Food101, EuroSAT and DTD show state-of-the-art error detection performance, significantly outperforming both deterministic and probabilistic VLM baselines. Code is available at https://github.com/zhenxianglin/ICPE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes