CVDec 21, 2025

Revealing Perception and Generation Dynamics in LVLMs: Mitigating Hallucinations via Validated Dominance Correction

arXiv:2512.18813v13 citationsh-index: 13
Originality Incremental advance
AI Analysis

It addresses hallucinations in LVLMs, which is a critical issue for improving reliability in vision-language tasks, though it appears incremental as it builds on existing models with a correction strategy.

This work tackled the problem of hallucinations in Large Vision-Language Models by analyzing their internal perception and generation dynamics, revealing patterns like GATE and SAD, and proposed the VDC strategy to correct unsupported tokens, resulting in substantial mitigation of hallucinations as confirmed by experiments across multiple models and benchmarks.

Large Vision-Language Models (LVLMs) have shown remarkable capabilities, yet hallucinations remain a persistent challenge. This work presents a systematic analysis of the internal evolution of visual perception and token generation in LVLMs, revealing two key patterns. First, perception follows a three-stage GATE process: early layers perform a Global scan, intermediate layers Approach and Tighten on core content, and later layers Explore supplementary regions. Second, generation exhibits an SAD (Subdominant Accumulation to Dominant) pattern, where hallucinated tokens arise from the repeated accumulation of subdominant tokens lacking support from attention (visual perception) or feed-forward network (internal knowledge). Guided by these findings, we devise the VDC (Validated Dominance Correction) strategy, which detects unsupported tokens and replaces them with validated dominant ones to improve output reliability. Extensive experiments across multiple models and benchmarks confirm that VDC substantially mitigates hallucinations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes