CVCLNov 12, 2025

Taming Object Hallucinations with Verified Atomic Confidence Estimation

arXiv:2511.09228v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses reliability issues in MLLMs for users in vision-language applications, though it is incremental as it builds on existing self-verification methods.

The paper tackles object hallucinations in Multimodal Large Language Models (MLLMs) by introducing TACO, a framework that uses self-verification and confidence calibration, resulting in consistent performance improvements across five benchmarks with models like LLaVA-1.5-7B and CogVLM2.

Multimodal Large Language Models (MLLMs) often suffer from hallucinations, particularly errors in object existence, attributes, or relations, which undermine their reliability. We introduce TACO (Verified Atomic Confidence Estimation), a simple framework that mitigates hallucinations through self-verification and confidence calibration without relying on external vision experts. TACO decomposes responses into atomic queries, paraphrases them to reduce sensitivity to wording, and estimates confidence using self-consistency (black-box) or self-confidence (gray-box) aggregation, before refining answers with a language model. Experiments on five benchmarks (POPE, MME, HallusionBench, AMBER, and MM-Hal Bench) with two MLLMs (\texttt{LLaVA-1.5-7B} and \texttt{CogVLM2}) show that TACO consistently outperforms direct prompting and Visual Contrastive Decoding, reduces systematic biases, and improves confidence calibration, demonstrating its effectiveness in enhancing the faithfulness of MLLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes