CV AINov 14, 2025

PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models

Nhat Hoang-Xuan, Minh Vu, My T. Thai, Manish Bhattarai

arXiv:2511.11502v13.6h-index: 6

Originality Incremental advance

AI Analysis

This addresses reliability issues in vision-language models for applications like real-time filtering, though it is incremental as it builds on existing attention mechanisms.

The paper tackles object hallucinations in large vision-language models by showing that weak image dependence correlates with hallucinations, and introduces the Prelim Attention Score (PAS), a lightweight method that achieves state-of-the-art detection across models and datasets.

Large vision-language models (LVLMs) are powerful, yet they remain unreliable due to object hallucinations. In this work, we show that in many hallucinatory predictions the LVLM effectively ignores the image and instead relies on previously generated output (prelim) tokens to infer new objects. We quantify this behavior via the mutual information between the image and the predicted object conditioned on the prelim, demonstrating that weak image dependence strongly correlates with hallucination. Building on this finding, we introduce the Prelim Attention Score (PAS), a lightweight, training-free signal computed from attention weights over prelim tokens. PAS requires no additional forward passes and can be computed on the fly during inference. Exploiting this previously overlooked signal, PAS achieves state-of-the-art object-hallucination detection across multiple models and datasets, enabling real-time filtering and intervention.

View on arXiv PDF

Similar