CVCLLGFeb 1, 2024

A Survey on Hallucination in Large Vision-Language Models

arXiv:2402.00253v2354 citationsh-index: 6
AI Analysis

It tackles the challenge of improving reliability in LVLMs for AI practitioners, but is incremental as it synthesizes existing knowledge without new experimental results.

This survey addresses the problem of hallucination in Large Vision-Language Models, where generated text misaligns with visual content, by providing an overview of symptoms, evaluation methods, root causes, and mitigation strategies to facilitate future research.

Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ``hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. In this comprehensive survey, we dissect LVLM-related hallucinations in an attempt to establish an overview and facilitate future mitigation. Our scrutiny starts with a clarification of the concept of hallucinations in LVLMs, presenting a variety of hallucination symptoms and highlighting the unique challenges inherent in LVLM hallucinations. Subsequently, we outline the benchmarks and methodologies tailored specifically for evaluating hallucinations unique to LVLMs. Additionally, we delve into an investigation of the root causes of these hallucinations, encompassing insights from the training data and model components. We also critically review existing methods for mitigating hallucinations. The open questions and future directions pertaining to hallucinations within LVLMs are discussed to conclude this survey.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes