CVAIMar 13, 2025

Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

arXiv:2503.10183v36 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of visual hallucination for users of vision-language models, offering an incremental improvement over existing methods.

The paper tackles visual hallucination in vision-language models by proposing the Perception Magnifier (PM), a novel decoding method that iteratively magnifies relevant visual regions, resulting in more accurate and faithful responses with superior hallucination mitigation.

Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily mitigate hallucination by contrastively reducing language biases or amplifying the weights of visual embedding during decoding. However, these approaches remain limited in their ability to capture fine-grained visual details. In this work, we propose the Perception Magnifier (PM), a novel visual decoding method that iteratively isolates relevant visual tokens based on attention and magnifies the corresponding regions, spurring the model to concentrate on fine-grained visual details during decoding. By magnifying critical regions while preserving the structural and contextual information at each decoding step, PM allows the VLM to enhance its scrutiny of the visual input, hence producing more accurate and faithful responses. Extensive experimental results demonstrate that PM not only achieves superior hallucination mitigation but also enhances language generation while preserving strong reasoning capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes