CVAILGMar 30, 2025

DASH: Detection and Assessment of Systematic Hallucinations of VLMs

arXiv:2503.23573v113 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses the issue of unreliable VLM outputs in open-world settings for users relying on accurate object detection, though it is incremental as it builds on existing hallucination assessment methods.

The paper tackles the problem of object hallucinations in vision-language models (VLMs) by proposing DASH, an automatic pipeline to detect systematic hallucinations on real-world images, identifying over 19k clusters with 950k images across models and showing that fine-tuning with DASH-generated data mitigates these errors.

Vision-language models (VLMs) are prone to object hallucinations, where they erroneously indicate the presenceof certain objects in an image. Existing benchmarks quantify hallucinations using relatively small, labeled datasets. However, this approach is i) insufficient to assess hallucinations that arise in open-world settings, where VLMs are widely used, and ii) inadequate for detecting systematic errors in VLMs. We propose DASH (Detection and Assessment of Systematic Hallucinations), an automatic, large-scale pipeline designed to identify systematic hallucinations of VLMs on real-world images in an open-world setting. A key component is DASH-OPT for image-based retrieval, where we optimize over the ''natural image manifold'' to generate images that mislead the VLM. The output of DASH consists of clusters of real and semantically similar images for which the VLM hallucinates an object. We apply DASH to PaliGemma and two LLaVA-NeXT models across 380 object classes and, in total, find more than 19k clusters with 950k images. We study the transfer of the identified systematic hallucinations to other VLMs and show that fine-tuning PaliGemma with the model-specific images obtained with DASH mitigates object hallucinations. Code and data are available at https://YanNeu.github.io/DASH.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes