AILGMay 29

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

arXiv:2605.3135415.7
AI Analysis

This work provides trace-level diagnostics for understanding communication fidelity bottlenecks in modular visual reasoning systems, which is important for designing reliable resource-constrained agents.

This paper investigates failure modes in shared-state collaboration among resource-constrained visual agents (4B-8B models) using a framework called CoSee. It found that naive shared workspaces often amplify hallucinations, identifying Noise Reinforcement and Policy Collapse as dominant failure modes, and showing that increased compute can negatively correlate with performance without explicit verification.

Modular visual reasoning systems increasingly rely on shared working memory for multi-step collaboration, yet the failure dynamics of intermediate state evolution in low-capacity regimes remain underexplored. We study failure modes of collaborative reasoning with weak learners (4B--8B models) through the lens of noise accumulation. We introduce CoSee, an auditing framework that formalizes the read-write-verify loop to trace information flow in document visual question answering. Across multi-page, chart, and web-based benchmarks, we find a counter-intuitive degradation: naive shared workspaces often amplify hallucinations rather than resolve them. We identify two dominant failure modes: Noise Reinforcement, where ungrounded notes are reused as evidence, and Policy Collapse, where added context shifts the model toward under-specified, short-form answers. Using cost-accuracy Pareto frontiers, we show that increased compute can correlate negatively with performance without explicit verification. Our findings suggest that for resource-constrained agents, the bottleneck lies not in reasoning depth but in communication fidelity, providing trace-level diagnostics and a mechanistic baseline for reliable modular design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes