WSI-Agents: A Collaborative Multi-Agent System for Multi-Modal Whole Slide Image Analysis
This addresses the need for versatile and accurate analysis in digital pathology, though it appears incremental as it builds on existing multi-agent and MLLM approaches.
The paper tackled the problem of underperformance in multi-modal whole slide image analysis by multi-modal large language models compared to task-specific models, proposing WSI-Agents, a collaborative multi-agent system that integrates specialized agents with task allocation and verification mechanisms, achieving superior performance on benchmarks.
Whole slide images (WSIs) are vital in digital pathology, enabling gigapixel tissue analysis across various pathological tasks. While recent advancements in multi-modal large language models (MLLMs) allow multi-task WSI analysis through natural language, they often underperform compared to task-specific models. Collaborative multi-agent systems have emerged as a promising solution to balance versatility and accuracy in healthcare, yet their potential remains underexplored in pathology-specific domains. To address these issues, we propose WSI-Agents, a novel collaborative multi-agent system for multi-modal WSI analysis. WSI-Agents integrates specialized functional agents with robust task allocation and verification mechanisms to enhance both task-specific accuracy and multi-task versatility through three components: (1) a task allocation module assigning tasks to expert agents using a model zoo of patch and WSI level MLLMs, (2) a verification mechanism ensuring accuracy through internal consistency checks and external validation using pathology knowledge bases and domain-specific models, and (3) a summary module synthesizing the final summary with visual interpretation maps. Extensive experiments on multi-modal WSI benchmarks show WSI-Agents's superiority to current WSI MLLMs and medical agent frameworks across diverse tasks.