CVAIFeb 22, 2024

Uncertainty-Aware Evaluation for Vision-Language Models

arXiv:2402.14418v226 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the need for more robust evaluation metrics in vision-language AI, particularly for researchers and practitioners assessing model reliability, though it is incremental as it builds on existing uncertainty quantification methods like conformal prediction.

The paper tackles the problem that current evaluation methods for Vision-Language Models (VLMs) overlook uncertainty, which is crucial for comprehensive assessment. They present a benchmark incorporating uncertainty quantification, analyzing 20+ VLMs on multiple-choice VQA tasks across 5 datasets, and demonstrate that models' uncertainty is not aligned with their accuracy, with the highest accuracy models sometimes having the highest uncertainty.

Vision-Language Models like GPT-4, LLaVA, and CogVLM have surged in popularity recently due to their impressive performance in several vision-language tasks. Current evaluation methods, however, overlook an essential component: uncertainty, which is crucial for a comprehensive assessment of VLMs. Addressing this oversight, we present a benchmark incorporating uncertainty quantification into evaluating VLMs. Our analysis spans 20+ VLMs, focusing on the multiple-choice Visual Question Answering (VQA) task. We examine models on 5 datasets that evaluate various vision-language capabilities. Using conformal prediction as an uncertainty estimation approach, we demonstrate that the models' uncertainty is not aligned with their accuracy. Specifically, we show that models with the highest accuracy may also have the highest uncertainty, which confirms the importance of measuring it for VLMs. Our empirical findings also reveal a correlation between model uncertainty and its language model part.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes