CVApr 11

Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation

arXiv:2604.1007170.5h-index: 28
AI Analysis

For MLLM users, this method addresses the persistent hallucination problem where text contradicts visual content, offering a training-free decoding strategy.

DaID introduces a contrastive decoding framework that dynamically calibrates token generation by identifying a Spotlight layer to amplify visual signals and a Shadow layer to suppress textual inertia, significantly reducing hallucination in MLLMs while improving reasoning.

Multimodal Large Language Models (MLLMs) have demonstrated remarkable reasoning capabilities yet continue to suffer from hallucination, where generated text contradicts visual content. In this paper, we introduce Dual-Anchor Introspective Decoding (DaID), a novel contrastive decoding framework that dynamically calibrates each token generation by mining the model's internal perceptual discrepancies. Specifically, DaID identifies a Spotlight layer to amplify visual factual signals and a Shadow layer to suppress textual inertia. By leveraging visual attention distributions to guide this dual-anchor selection process, our method ensures precise, token-specific adaptation. Experimental results across multiple benchmarks and MLLMs demonstrate that DaID significantly mitigates hallucination while enhancing general reasoning capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes