The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
This addresses reliability issues in VLMs for real-world applications, but it appears incremental as it builds on existing visual prompting methods.
The study tackled object hallucination in Vision-Language Models by analyzing Attention-driven visual prompting, finding that preserving background context is crucial for mitigation, though no concrete numbers were provided.
Vision-Language Models (VLMs) occasionally generate outputs that contradict input images, constraining their reliability in real-world applications. While visual prompting is reported to suppress hallucinations by augmenting prompts with relevant area inside an image, the effectiveness in terms of the area remains uncertain. This study analyzes success and failure cases of Attention-driven visual prompting in object hallucination, revealing that preserving background context is crucial for mitigating object hallucination.