CVAINov 14, 2025

PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models

arXiv:2511.11502v1h-index: 6
Originality Incremental advance
AI Analysis

This addresses reliability issues in vision-language models for applications like real-time filtering, though it is incremental as it builds on existing attention mechanisms.

The paper tackles object hallucinations in large vision-language models by showing that weak image dependence correlates with hallucinations, and introduces the Prelim Attention Score (PAS), a lightweight method that achieves state-of-the-art detection across models and datasets.

Large vision-language models (LVLMs) are powerful, yet they remain unreliable due to object hallucinations. In this work, we show that in many hallucinatory predictions the LVLM effectively ignores the image and instead relies on previously generated output (prelim) tokens to infer new objects. We quantify this behavior via the mutual information between the image and the predicted object conditioned on the prelim, demonstrating that weak image dependence strongly correlates with hallucination. Building on this finding, we introduce the Prelim Attention Score (PAS), a lightweight, training-free signal computed from attention weights over prelim tokens. PAS requires no additional forward passes and can be computed on the fly during inference. Exploiting this previously overlooked signal, PAS achieves state-of-the-art object-hallucination detection across multiple models and datasets, enabling real-time filtering and intervention.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes