CVLGJun 9, 2025

Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models

arXiv:2506.07575v12 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the need for a unified uncertainty assessment in LMMs, which is incremental as it builds on existing uncertainty methods but applies them to multimodal contexts.

The paper tackles the problem of evaluating and quantifying uncertainty in Large Multimodal Models (LMMs) by introducing Uncertainty-o, a model-agnostic framework that reliably estimates uncertainty across 18 benchmarks and 10 LMMs, enhancing tasks like hallucination detection and mitigation.

Large Multimodal Models (LMMs), harnessing the complementarity among diverse modalities, are often considered more robust than pure Language Large Models (LLMs); yet do LMMs know what they do not know? There are three key open questions remaining: (1) how to evaluate the uncertainty of diverse LMMs in a unified manner, (2) how to prompt LMMs to show its uncertainty, and (3) how to quantify uncertainty for downstream tasks. In an attempt to address these challenges, we introduce Uncertainty-o: (1) a model-agnostic framework designed to reveal uncertainty in LMMs regardless of their modalities, architectures, or capabilities, (2) an empirical exploration of multimodal prompt perturbations to uncover LMM uncertainty, offering insights and findings, and (3) derive the formulation of multimodal semantic uncertainty, which enables quantifying uncertainty from multimodal responses. Experiments across 18 benchmarks spanning various modalities and 10 LMMs (both open- and closed-source) demonstrate the effectiveness of Uncertainty-o in reliably estimating LMM uncertainty, thereby enhancing downstream tasks such as hallucination detection, hallucination mitigation, and uncertainty-aware Chain-of-Thought reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes