MACVSep 26, 2025

Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow

arXiv:2509.21789v213 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses a novel failure mode in multi-agent visual systems, offering a plug-and-play solution to enhance reliability, though it is incremental as it builds on existing MAS and VLM frameworks.

The paper tackles the problem of multi-agent visual hallucination snowballing in Visual Language Model-powered Multi-Agent Systems, where hallucinations amplify across agents due to reduced visual attention, and proposes ViF, a lightweight mitigation method that improves performance across eight benchmarks based on four MAS structures and ten base models.

Multi-Agent System (MAS) powered by Visual Language Models (VLMs) enables challenging tasks but suffers from a novel failure term, multi-agent visual hallucination snowballing, where hallucinations are seeded in a single agent and amplified by following ones due to the over-reliance on textual flow to relay visual information. Through turn-, layer-, and token-wise attention analyses, we provide detailed insights into the essence of hallucination snowballing regarding the reduction of visual attention allocation. It leads us to identify a subset of vision tokens with a unimodal attention peak in middle layers that best preserve visual evidence but gradually diminish in deeper agent turns, resulting in the visual hallucination snowballing in MAS. Thus, we propose ViF, a lightweight, plug-and-play mitigation paradigm that relays inter-agent messages with Visual Flow powered by the selected visual relay tokens and applies attention reallocation to amplify this pattern. The experiment results demonstrate that our method markedly reduces hallucination snowballing, consistently improving the performance across eight benchmarks based on four common MAS structures and ten base models. The source code is publicly available at: https://github.com/YU-deep/ViF.git.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes