CVAIJun 13, 2025

Stop learning it all to mitigate visual hallucination, Focus on the hallucination target

arXiv:2506.11417v12 citationsh-index: 1CVPR
Originality Incremental advance
AI Analysis

This addresses reliability issues in MLLMs for practical applications requiring accurate object identification, representing an incremental improvement.

The paper tackles the problem of visual hallucination in Multimodal Large Language Models (MLLMs) by proposing a preference learning approach that focuses on targeted areas where hallucinations occur, resulting in effective reduction of hallucinations across multiple vision tasks without diminishing overall performance.

Multimodal Large Language Models (MLLMs) frequently suffer from hallucination issues, generating information about objects that are not present in input images during vision-language tasks. These hallucinations particularly undermine model reliability in practical applications requiring accurate object identification. To address this challenge, we propose \mymethod,\ a preference learning approach that mitigates hallucinations by focusing on targeted areas where they occur. To implement this, we build a dataset containing hallucinated responses, correct responses, and target information (i.e., objects present in the images and the corresponding chunk positions in responses affected by hallucinations). By applying a preference learning method restricted to these specific targets, the model can filter out irrelevant signals and focus on correcting hallucinations. This allows the model to produce more factual responses by concentrating solely on relevant information. Experimental results demonstrate that \mymethod\ effectively reduces hallucinations across multiple vision hallucination tasks, improving the reliability and performance of MLLMs without diminishing overall performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes