AICLCVJun 8, 2025

Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images

arXiv:2506.07184v13 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses a specific reliability issue in multimodal AI for sequential image tasks, representing an incremental improvement by focusing on a less-studied type of hallucination.

The paper tackles behavioral hallucination in multimodal large language models for sequential images by identifying prior-driven bias and the snowball effect as key factors, and introduces SHE, a lightweight two-stage framework that reduces behavioral hallucination by over 10% on the proposed BEACH metric while maintaining descriptive accuracy.

While multimodal large language models excel at various tasks, they still suffer from hallucinations, which limit their reliability and scalability for broader domain applications. To address this issue, recent research mainly focuses on objective hallucination. However, for sequential images, besides objective hallucination, there is also behavioral hallucination, which is less studied. This work aims to fill in the gap. We first reveal that behavioral hallucinations mainly arise from two key factors: prior-driven bias and the snowball effect. Based on these observations, we introduce SHE (Sequence Hallucination Eradication), a lightweight, two-stage framework that (1) detects hallucinations via visual-textual alignment check using our proposed adaptive temporal window and (2) mitigates them via orthogonal projection onto the joint embedding space. We also propose a new metric (BEACH) to quantify behavioral hallucination severity. Empirical results on standard benchmarks demonstrate that SHE reduces behavioral hallucination by over 10% on BEACH while maintaining descriptive accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes