CVJun 24, 2024

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji

arXiv:2406.16449v423.032 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a critical gap in visual comprehension for LVLM users, but it is incremental as it builds on existing hallucination research by focusing on relationships.

The paper tackles the problem of relationship hallucinations in Large Vision-Language Models (LVLMs), which previous work neglected, and introduces R-Bench, a benchmark for evaluation, identifying three types of co-occurrences and issues like over-reliance on common sense knowledge.

The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, which is essential for visual comprehension. In this work, we introduce R-Bench, a novel benchmark for evaluating Vision Relationship Hallucination. R-Bench features image-level questions that focus on the existence of relationships and instance-level questions that assess local visual comprehension. We identify three types of relationship co-occurrences that lead to hallucinations: relationship-relationship, subject-relationship, and relationship-object. The visual instruction tuning dataset's long-tail distribution significantly impacts LVLMs' understanding of visual relationships. Furthermore, our analysis reveals that current LVLMs tend to disregard visual content and overly rely on the common sense knowledge of Large Language Models. They also struggle with reasoning about spatial relationships based on contextual information.

View on arXiv PDF Code

Similar