CVCLMMMay 17, 2023

Evaluating Object Hallucination in Large Vision-Language Models

arXiv:2305.10355v31696 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses a critical reliability issue for users of LVLMs in multimodal applications, though it is incremental as it focuses on evaluation rather than solving hallucination.

The paper tackles the problem of object hallucination in large vision-language models (LVLMs), where models generate objects inconsistent with images, and introduces POPE, an improved evaluation method that assesses hallucination more stably and flexibly.

Inspired by the superior language abilities of large language models (LLM), large vision-language models (LVLM) have been recently explored by integrating powerful LLMs for improving the performance on complex multimodal tasks. Despite the promising progress on LVLMs, we find that LVLMs suffer from the hallucination problem, i.e. they tend to generate objects that are inconsistent with the target images in the descriptions. To investigate it, this work presents the first systematic study on object hallucination of LVLMs. We conduct the evaluation experiments on several representative LVLMs, and show that they mostly suffer from severe object hallucination issue. We further discuss that the visual instructions may influence the hallucination, and find that: objects that frequently occur in the visual instructions or co-occur with the image objects, are obviously prone to be hallucinated by LVLMs. Besides, we find that existing evaluation methods might be affected by the input instructions and generation styles of LVLMs. Thus, we further design an improved evaluation method for object hallucination by proposing a polling-based query method called POPE. Experiment results demonstrate that our POPE can evaluate the object hallucination in a more stable and flexible way. Our codes and data are publicly available at https://github.com/RUCAIBox/POPE.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes