LG AI CL CVFeb 13, 2024

Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance

Linxi Zhao, Yihe Deng, Weitong Zhang, Quanquan Gu

arXiv:2402.08680v227.835 citationsh-index: 10Has CodeICML

Originality Incremental advance

AI Analysis

This addresses a critical reliability issue for users of LVLMs in applications like image captioning or visual QA, offering a practical, cost-effective solution without requiring training or proprietary APIs.

The paper tackles the problem of object hallucination in Large Vision-Language Models (LVLMs) by proposing MARINE, a training-free and API-free framework that uses image-grounded guidance from open-source vision models, reducing hallucinations consistently across 5 LVLMs and outperforming fine-tuning-based methods in evaluations.

The advancement of Large Vision-Language Models (LVLMs) has increasingly highlighted the critical issue of their tendency to hallucinate non-existing objects in the images. To address this issue, previous works focused on using specially curated datasets or powerful LLMs to rectify the outputs of LVLMs. However, these approaches require either costly training or fine-tuning, or API access to proprietary LLMs for post-generation correction. In response to these limitations, we propose Mitigating hallucinAtion via image-gRounded guIdaNcE (MARINE), a framework that is both training-free and API-free. MARINE effectively and efficiently reduces object hallucinations during inference by introducing image-grounded guidance to LVLMs. This is achieved by leveraging open-source vision models to extract object-level information, thereby enhancing the precision of LVLM-generated content. Our framework's flexibility further allows for the integration of multiple vision models, enabling more reliable and robust object-level guidance. Through comprehensive evaluations across 5 popular LVLMs with diverse evaluation metrics and benchmarks, we demonstrate the effectiveness of MARINE, which even outperforms existing fine-tuning-based methods. Remarkably, it reduces hallucinations consistently in GPT-4V-assisted evaluation while maintaining the detailedness of LVLMs' generations. We release our code at https://github.com/Linxi-ZHAO/MARINE.

View on arXiv PDF Code

Similar