CVAICLMMOct 3, 2025

MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding

arXiv:2510.02790v15 citationsh-index: 3Has CodeEMNLP
Originality Incremental advance
AI Analysis

This addresses hallucinations in LVLMs, which generate contradictory content, offering a more stable solution compared to existing methods like contrastive decoding and attention manipulation, though it appears incremental as it builds on these approaches.

The paper tackles the problem of hallucinations in large vision-language models (LVLMs) by proposing MaskCD, a method that masks image heads to construct contrastive samples for decoding, and shows it effectively reduces hallucinations while preserving general capabilities, as demonstrated on benchmarks like CHAIR, POPE, AMBER, and MME with models such as LLaVA-1.5-7b and Qwen-VL-7b.

Large vision-language models (LVLMs) have shown remarkable performance in visual-language understanding for downstream multimodal tasks. While their capabilities are improving, problems emerge simultaneously. Among those problems, the hallucinations have attracted much attention, which stands for the phenomenon where LVLMs generate contradictory content to their input visual and text contents. Many approaches have been proposed to deal with this issue, such as contrastive decoding and attention manipulation. However, contrastive decoding methods struggle in constructing appropriate contrastive samples, and attention manipulation methods are highly sensitive, lacking stability. In this work, we propose image head Masked Contrastive Decoding (MaskCD). Our approach utilizes the "image heads" in LVLMs, masking them to construct contrastive samples for contrastive decoding. We evaluated MaskCD on LLaVA-1.5-7b and Qwen-VL-7b, using various benchmarks such as CHAIR, POPE, AMBER and MME. The results demonstrate that MaskCD effectively alleviates the phenomenon of hallucinations and retains the general capabilities of LVLMs. Corresponding resources could be found at: https://github.com/Deng-Jingyuan/MaskCD .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes