AIAug 8, 2025

Mediator-Guided Multi-Agent Collaboration among Open-Source Models for Medical Decision-Making

arXiv:2508.05996v24 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing medical multimodal intelligence for clinicians by enabling effective collaboration among heterogeneous vision-language models, though it is incremental as it builds on existing multi-agent and VLM concepts.

The authors tackled the challenge of extending multi-agent systems to multimodal medical decision-making by proposing MedOrch, a mediator-guided framework that uses an LLM-based mediator to coordinate multiple open-source VLMs, achieving superior performance on five medical vision question answering benchmarks without model training.

Complex medical decision-making involves cooperative workflows operated by different clinicians. Designing AI multi-agent systems can expedite and augment human-level clinical decision-making. Existing multi-agent researches primarily focus on language-only tasks, yet their extension to multimodal scenarios remains challenging. A blind combination of diverse vision-language models (VLMs) can amplify an erroneous outcome interpretation. VLMs in general are less capable in instruction following and importantly self-reflection, compared to large language models (LLMs) of comparable sizes. This disparity largely constrains VLMs' ability in cooperative workflows. In this study, we propose MedOrch, a mediator-guided multi-agent collaboration framework for medical multimodal decision-making. MedOrch employs an LLM-based mediator agent that enables multiple VLM-based expert agents to exchange and reflect on their outputs towards collaboration. We utilize multiple open-source general-purpose and domain-specific VLMs instead of costly GPT-series models, revealing the strength of heterogeneous models. We show that the collaboration within distinct VLM-based agents can surpass the capabilities of any individual agent. We validate our approach on five medical vision question answering benchmarks, demonstrating superior collaboration performance without model training. Our findings underscore the value of mediator-guided multi-agent collaboration in advancing medical multimodal intelligence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes